You are on page 1of 5

Big Data Challenges and Achievements: Applications

on Smart Cities and Energy sector


Tareq Abed Mohammed Ahmed Ghareeb Shadi Aljawarneh
Collage of Computer Science and Infrastructure and Environmental Hussein Al-bayaty
College of Engineering Jordan University of Science
Information Technology Systems (INES) Program
University of Kirkuk University of Kirkuk and Technology, Irbid,
University of North Carolina at
Kirkuk, Iraq Charlotte Kirkuk, Iraq Jordan
July 2019 Charlotte, USA July 2019 saaljawarneh@just.edu.jo
tareq.mahammed@uokirkuk.edu.iq aghareeb@uncc.edu dr.hussein@uokirkuk.edu.iq

ABSTRACT in these storage environments. Scientists have analyzed great data


In this paper, the Big Data challenges and the processing is analyzed, sheets. One of the very important application behaviors for generating
recently great attention has been paid to the challenges for great data, big data is undoubtedly mathematics and computer science. Different
largely due to the wide spread of applications and systems used in real studies and researchers provide a large amount of data daily (eg,
life, such as presentation, modeling, processing and large (often Chemistry, physics, astronomy, biology, biomedicine, etc.), but the
unlimited) data storage. Mass Data results of information-based documents are determined. There are no
Survey, OLAP Mass Data, Mass Data Dissemination and Mass Data actual database management rules for analytics methods. From the
Protection. Consequently, we focus on further research trends and, research point of view, two major research challenges are emerging.
as a default, we will explore a future research challenge research First, the transport problem. Big data is stored in various private and
project in this area of research. personal resources (legacy systems, web, storage of scientific data,
sensor databases, publications, social networks, etc.). Secondly,
graphs, plans, pianos, etc. are used for decision making. Business
Keywords intelligence components such as (BI) and thus effectively implement
Big Data, OLAP, Data Processing, Data Mining, Machin Learning . complex analysis to manage structured data storage, recycling and
transformation.
Big data and analytic research on big data refer to three important
1. INTRODUCTION points that are discussed in this article. OLAP is a large amount of
Now a days, great attention has been given to large data (e.g. [2, 9, data, a large data presentation, and a high degree of confidentiality
16]), largely due to a wide range of research problems associated of data (for example, [3, 7, 11, 12]). These problems lead to
with practical systems and plans such as modeling and the Mass. realistic research problems of interest to research communities in
Distributed, large-scale storage database and mining industry. The the near future. This article-inspired main trend focuses on the
term "big data" points to specific data sets that are not based on the abovementioned research topics and other research topics and
source of human information and are collected from different inputs indirectly defines a research agenda that addresses future
such internet, Social media and the internet (e.g. [13]). The data challenges in the big data period [20-31].
stored in the base line of all apps scenarios contains few common
features. Some of these (1) are large-amount data. however,
redistribution of general programs and (ii) extensive issues that
reflect the use of massive and large data pools (i.e. recent mass data). 2. Big Data Storing and Processing
The volume of entry will lead to considerable growth. (iii) Support According to the true flow of big data apps scenarios (eg [8]),
for download downloads (ETL). compiling, extracting, converting, loading, storing and converting,
extracting, converting and loading such records using Onlapping (eg
Among the researches in the search for big data, called big data (for [1, 15]). The data warehouse and OLAP are classic science domains
example, [15], it plays a leading role in the development of big family to which the research community has been referring from the
data and mining. The analysis can be based on large complex database and data warehouse for several years. It is symmetrically
repositories whose main purpose is to find useful information stored contextualized in a large type family, from classic relational data
sets (eg [2]) to sets of graphical data (e.g. [7]), XML (e.g. [5]) and
Permission to make digital or hard copies of all or part of this work for streaming. Data for new social network data (e.g. [8]) (e.g. [11, 10,
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies 12]) etc.
bear this notice and the full citation on the first page. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. With the emergence of large data search content, the problem of
DATA’19, December 2–5, 2019, Dubai, United Arab Emirates. using OLAP cubes compared to useless data is naturally one of the
Copyright 2019 Association for Computing Machinery. ACM ISBN most interesting problems in the search environment and one of the
978-1-4503-7284-8/19/12…$15.00. powerful technological advances that large-scale realities can
DOI: http:/doi.org/10.1145/3368691.3368717 achieve. life - intensive data applications and systems.
Unfortunately, an affinity definition, which is the state of technical research problems, such as the development of methods to "measure"
solutions, cannot be realized by calculating the cube data in the the amount of OLAP data store available from large data pools.
BDDD based on the data in the Big Data repositories: (1) the volume Therefore, at the low level, visual impacts and interactive detection
that is actually explosive in these data sets; Multidimensional (eg are reliable issues for large amounts of data in OLAP. Especially in
cardinal assignments, irregular hierarchies, dimension properties, the visualization of OLAP data models (e.g. [13, 14]), a large
etc.) that can be very high in these data sets. amount of data in a large amount of data plays an important part in
this study. As a result, new visual metaphors, methods, and solutions
have been made to find the challenge of visualizing large OLAP data
In fact, work tables can easily grow when calculating big data
cubes on large amounts of data. Fast and effective visualization of
collections. This contributes to significant computational problems, result and data core, visualization of forms and effective
as there may be larger goals for practical applications (eg [12]). In visualization of portable devices.
addition, complexity is just as important because creating OLAP
data templates also has a significant impact on complexity issues
that do not occur in traditional OLAP parameters (such as relative
environments). For example, due to the strict structuring of Great
Data Collections, the number of formats can be very explosive, with
multiple (and nonhomogeneous) means for such data mixes.

OLAP data cube design methods were also important for database
searching and data archiving. In the case of OLAP design
methodology for big data, the performance aspect should be
considered more closely because of the obvious side effects of this
design activity. In this case, designers should focus on the following
important questions: (1) How long does it take to create the data
cube to be designed (it may be unqualified to calculate totals in large
data)? (2) How should the database cube be upgraded, and which
managed the plan should be selected? (3) What construction
strategies should be chosen (e.g. sharing and invasion (e.g. [16] )?.
An important role is also played by memory activity, strongly related
to the problem of computational methodology. The main problem to
be solved in this context is that the OLAP data cube must be set to
large data in memory. This is a serious problem that needs to be
considered, because regardless of how high or conflicting it is and if
it can be clearly demonstrated, bad and deep solutions need to be
Figure 1. Increase in Big Data from different Sources.
explored. In addition, the emergence of innovative hardware
solutions, such as the manipulation of data based on GASU (e.g. [15]),
will control the animation with respect to OLAP data cubes calculated How to design an analytical process in OLAP is discussed below.
on large data. Informed Information Form (under analysis). Integration with
In terms of query-oriented aspect and user-defined aspect, we have classic data-intensive platforms OLAP models, technologies and
identified the following studies: (1) Query and optimization platforms have been integrated into classic, data-intensive
languages: conventional MDX methods do not include optimization platforms in a smooth and wide manner. Scaling data intensive
systems.
solutions taking Big Data needs - what you need Future research
focuses on optimization issues provided by large data processing User 3. Big Data main sources
Performance: OLAP, Big Data tend to be huge data cubes are Generally, a search engine on the Internet (for example, Google or
significantly manufactured; therefore, the user's performance, easily Yahoo) generally generates queries on the selection of strings of
and in cube, especially during assembly and query; this means that generic resources. The aforementioned in a sense program aims to
end-user performance should be considered a critical factor in my conduct research efforts to explain the query languages that are
work. Designing OLAP Data Cubes vs. Large Data working in SQL development and it can be frustrating to see the
victory of the most important SQL query languages. We believe
that advanced search engines and database development tools can
Quality problems are becoming increasingly important in the next take complete advantages of the complexity of using complex data
generation OLAP data warehouses and methods for large amounts of to view and index a large technological infrastructure and large
data. Given the highly dispersed nature of large data sources, amounts of content through the Internet. Advanced query languages
calculations of these data sources can easily be kolayca moderate”. It and portrait design tools can be decisive to view the latest database
is therefore not difficult to analyze how quality control of the result features published in Web cloud.
data cube is necessary. Usability issues are also important, as it is clear
that OLAP data cubes need to be compile and maintain to obtain and
produce useful analyzes. In fact, this aspect raises a number of
4 Big Data Security and Privacy
"Data mining" is a bad term, then it was good and now it is getting
worse again. That's why we've created the term "big data" to provide
a good page for data extraction. Remember that "data mining" is a
factor applied by statisticians, which indicates that the results are
based on unreliable statistical data.
It was something to avoid. However, scientific computing and others
have begun to use this term to draw useful (and perhaps important)
conclusions from the data. Later, "data mining" began to relate to
data that should not be sorted from the database. Users began to
worry about using data mining for tasks they deemed inappropriate.

5 Applications of Big Data


First of all, remember that there is an algorithm managed by Google
advertisers, which determines which advertising should be
displayed. No one use the data on which the resolution is based. I
have heard Google say that people should at least consider the
Figure 2. Big Data types. decisions made by the algorithm to see if the algorithm works
correctly. They can do it if they are not required. Measuring CTR
changes after changing the algorithm indicates which algorithm is
The large data platforms can only be used from a technological and recommended.
practical perspective, including database techniques (including
As for the users. It's better to leave their services. But the economy
database theory). For example, now is a lot of effort to give smart
is not working, and I think it will. I know the next best thing for me.
results to general inputs, see many important web suggestions and
see Google Maps. In the early days, television commercials, newspapers or magazines
were relatively bad. Used for many purposes. For example, Golf
Google gives a very important solution; The Search engines of Digest golf clubs cost about ten times.
Google Appliance (GSA) provides enhanced control to public and
However, modern advertising technologies can increase your ad cost
private content, such as: For example, an unstructured database that
by ten factors. This growth has many advantages. It allows less
cannot be forwarded directly through the search engine. For this
advertising; ads make it more useful and supports open source
purpose, named connectors extend the GSA to non-Internet
services that are normally available. For example, write this paper
repositories [18]. For example, a tuple data can be accessed as an
using Google data. No advertising, only free vehicles.
XML document through custom links and converted to HTML
search documents. A database connection can be used as a simple
data exchange to distribute data over the network. 5 Conclusion
Inspired by this solution, we shape our vision of data transfer: In this article, a great technique of data such as Big Data, OLAP,
enriching the publication of databases by gathering additional large data, large data transfer and large data privacy, are involved in
concepts [17]. It should be emphasized that sending data as a content research and research in the field of large research. Great data is a
that enriches and at the same time enriching content is very popular reliable and efficient element of the new generation information
today in social networks - especially the basic power of Twitter's systems. We hope our business will become a turning point in this
success. Since the launch of Twitter for more than seven years, more difficult path.
than 200 million people and one billion tweets have been posted
every two and a half days.
6 REFERENCES
[1] A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin,
Data Sharing [18] is a conversion of data samples from source mode and A. Silberschatz. Hadoopdb: An architectural hybrid of
into target mode so that specific target data data respond to very mapreduce and dbms technologies for analytical workloads.
specific integration constraints (functions and internal PVLDB, 2(1):922–933, 2009.
dependencies). Targeting mode includes many new features that
often use a variable definition of the quantitative personality. The [2] D. Agrawal, S. Das, and A. E. Abbadi. Big data and cloud
main problem is that arbitrary behavior will be reduced when computing: current state and future opportunities. In EDBT,
selecting variable values. pages 530–533, 2011.
[3] L. Bellatreche, A. Cuzzocrea, and S. Benkrid. Effectively and
The assignment is done in the following way (as usual, lowercase efficiently designing and querying parallel relational data
and uppercase letters represent variables generally and historically, warehouses on heterogeneous database clusters: The f&a
in addition to the variables used to distinguish settings from free approach. J. Database Manag., 23(4):17–51, 2012.
dentistry):
[4] V. R. Borkar, M. J. Carey, and C. Li. Inside "big data [5] A. Cuzzocrea. Retrieving accurate estimates to olap queries
management": ogres, onions, or parfaits? In EDBT, pages 3– over uncertain and imprecise multidimensional data streams. In
14, 2012. SSDBM, pages 575–576, 2011.
[6] A. Cuzzocrea and S. Chakravarthy. Event-based lossy [21] Vangipuram Radhakrishna, Shadi Aljawarneh, P. V.
compression for effective and efficient olap over data Kumar, and Aravind Cheruvu. 2018. Kaala vrksha:
streams. extending vrksha for time profiled temporal association
mining. In Proceedings of the First International
[7] Data Knowl. Eng., 69(7):678–708, 2010.
Conference on Data Science, E-learning and Information
[8] A. Cuzzocrea, F. Furfaro, S. Greco, E. Masciari, G. M. Systems (DATA ’18). ACM, New York, NY, USA,
Mazzeo, and D. Saccà. A distributed system for Article 30, 8 pages. DOI:
answering range queries on sensor network data. In https://doi.org/10.1145/3279996.3280026
PerCom Workshops, pages 369– 373, 2005. [22] Shadi Aljawarneh, Vangipuram Radhakrishna, and Gali
[9] A. Cuzzocrea, D. Saccà, and P. Serafino. A hierarchy- Suresh Reddy. 2018. Mantra: a novel imputation measure
driven compression technique for advanced olap for disease classification and prediction. In Proceedings
visualization of multidimensional data cubes. In of the First International Conference on Data Science, E-
DaWaK, pages 106–119, 2006. learning and Information Systems (DATA ’18). ACM,
New York, NY, USA, Article 25, 5 pages. DOI:
[10] Adams, M.N.: Perspectives on Data Mining. https://doi.org/10.1145/3279996.3280021
International Journal of Market Research, 52(1), 11–19
[23] Shadi Aljawarneh, V. Radhakrishna, and Aravind
(2010)
Cheruvu. 2018. VRKSHA: A Novel Multi-Tree Based
[11] Asur, S., Huberman, B.A.: Predicting the Future with Sequential Approach for Seasonal Pattern Mining In
Social Media. In: ACM International Conference on Proceedings of the Fourth International Conference on
Web Intelligence and Intelligent Agent Technology, Engineering & MIS 2018 (ICEMIS ’18). ACM, New
vol. 1, pp. 492–499 (2010) York, NY, USA, Article 37, 10 pages. DOI:
https://doi.org/10.1145/3234698.3234735
[12] Bakshi, K.: Considerations for Big Data: Architecture
and Approaches. In: Proceedings of the IEEE [24] Vangipuram Radhakrishna, P. V. Kumar, V. Janaki, and
Aerospace Conference, pp. 1–7 (2012) Shadi Aljawarneh. 2018. GANDIVA - Time Profiled
Temporal Pattern Tree. In Proceedings of the Fourth
[13] Cebr: Data equity, Unlocking the value of big data. in: International Conference on Engineering & MIS 2018
SAS Reports, pp. 1–44 (2012) (ICEMIS ’18). ACM, New York, NY, USA, Article 36, 6
[14] A. Cuzzocrea, D. Saccà, and P. Serafino. Semantics- pages. DOI: https://doi.org/10.1145/3234698.3234734
aware advanced olap visualization of multidimensional [25] Aljawarneh, S.A. & Vangipuram,GARUDA: Gaussian
data cubes. IJDWM, 3(4):1–30, 2007. dissimilarity measure for feature representation and
[15] A. Cuzzocrea, I.-Y. Song, and K. C. Davis. Analytics anomaly detection in Internet of things, R. J Supercomput
over large-scale multidimensional data: the big data (2018). https://doi.org/10.1007/s11227-018-2397-3
revolution! In DOLAP, pages 101–104, 2011. [26] Baraa K. Muslmani, Saif Kazakzeh, Eyad Ayoubi, and
[16] J. Dean and S. Ghemawat. Mapreduce: simplified data Shadi Aljawarneh. 2018. Reducing integration
processing on large clusters. Commun. ACM, complexity of cloud-based ERP systems. In Proceedings
of the First International Conference on Data Science, E-
51(1):107–113, 2008.
learning and InformationSystems (DATA ’18). ACM,
[17] Fayaz Ahmad Lone, dr. Amit Kumar Chaturvedi New York, NY, USA, Article 37, 6 pages. DOI:
“Proposing a Novel Model on Security Challenges in https://doi.org/10.1145/3279996.3280033
Cloud Computing especially [27] S. A. Aljawarneh, V. Radhakrishna and A. Cheruvu,
[18] Social Media and social Sites”, International Journal "Extending the Gaussian membership function for finding
For Computer treands and Technology, Volume 47, similarity between temporal patterns," 2017 International
Number 1, May 2017. Conference on Engineering & MIS (ICEMIS), Monastir,
2017, pp. 1-6.doi: 10.1109/ICEMIS.2017.8273100
[19] V. Harsha Shastri,, V. sreeprada, T. Kavitha “A Survey
on Big Data Technologies, Challenges and Impact on [28] S. A. Aljawarneh, V. RadhaKrishna and G. R. Kumar, "A
Internet of things”, International Journal For Computer fuzzy measure for intrusion and anomaly detection," 2017
treands and Technology, Volume 35, Number 3, May International Conference on Engineering & MIS
2016. (ICEMIS), Monastir, 2017, pp. 1-6.doi:
10.1109/ICEMIS.2017.8273113
[20] Shadi A. Aljawarneh, Vangipuram Radhakrishna, and
John William Atwood. 2018. Ultimate: unearthing latent [29] S. A. Aljawarneh, V. R. Krishna and A. Cheruvu,
time profiled temporal associations. In Proceedings of the "Finding similar patterns in timestamped temporal
First International Conference on Data Science, E- datasets," 2017 International Conference on Engineering
learning and Information Systems (DATA ’18). ACM, & MIS (ICEMIS), Monastir, 2017, pp. 1-5.doi:
New York, NY, USA, Article 29, 8 pages. DOI: 10.1109/ICEMIS.2017.8273105
https://doi.org/10.1145/3279996.3280025 [30] Shadi A. Aljawarneh, Mohammed R. Elkobaisi,
Abdelsalam M. Maatuk,A new agent approach for
recognizing research trends in wearable
systems,Computers & Electrical Engineering,Volume
61,2017, Pages 275-286,ISSN 0045-7906.
[31] Shadi A. Aljawarneh, Radhakrishna Vangipuram,
Veereswara Kumar Puligadda, Janaki Vinjamuri,G-
SPAMINE: An approach to discover temporal association
patterns and trends in internet of things, Future
Generation Computer Systems, Volume 74,2017,Pages
430-443,ISSN 0167-
739X,https://doi.org/10.1016/j.future.2017.01.013.

You might also like