You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322098046

A Big Data Prediction Framework for Weather Forecast Using MapReduce


Algorithm

Article  in  Journal of Computational and Theoretical Nanoscience · November 2017


DOI: 10.1166/asl.2017.10237

CITATIONS READS

6 9,505

4 authors:

Khalid Adam Mazlina A. Majid


Universiti Malaysia Pahang Universiti Malaysia Pahang
24 PUBLICATIONS   272 CITATIONS    131 PUBLICATIONS   1,206 CITATIONS   

SEE PROFILE SEE PROFILE

Mohammed Fakherldin Jasni Mohamad Zain


Jazan University Universiti Teknologi MARA
24 PUBLICATIONS   94 CITATIONS    202 PUBLICATIONS   2,885 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Agent based Cloud Computing View project

Simulation and Modelling View project

All content following this page was uploaded by Khalid Adam on 15 January 2018.

The user has requested enhancement of the downloaded file.


RESEARCH ARTICLE XXXXXXXXXXXXXXXXX

Copyright © 2016 American Scientific Publishers Advanced Science


Letters
All rights reserved Vol. XXXXXXXXX
Printed in the United States of America

A Big Data Prediction Framework for Weather


Forecast Using MapReduce Algorithm
Khalid Adam1, Mazlina Abdul Majid1, Mohammed Adam Ibrahim Fakherldin2, Jasni Mohamed Zain1
1Faculty
of Computer Systems & Software Engineering, University Malaysia Pahang,
26300 Kuantan, Pahang, Malaysia
2Faculty of Computer Science and Information Systems, Jazan University,

P.O Box 114-, Saudi Arabia

Weather forecasting plays a vital role in daily routine, businesses and their decisions. The process of weather forecasting is
developing as the effect of advancement in technology right from the realization of increasing size of data, Weather
forecasting was found to be based on big data. The researchers have taken review with the objective to study the current
forecasting process and methods, and the need of a data structure is recognized for handling the weather data, which is bigger
in size, used for the process of weather forecasting. This paper presents a big data analysis framework for weather dataset
based on MapReduce Algorithm, and offers not only weather dataset analysis, but also various analytic capabilities on huge
amounts of data. However, this work establishes a guideline for researchers and industrial practitioners on how to analysis
big data.

Keywords: Big Data, Data Analysis, Weather forecasting, MapReduce and prediction framework.

1. INTRODUCTION weather data plays role in the development of economy as


a case. This weather data can be used by combining it
Studying historical data, for the purpose of future with other disciplines which can generate new
prediction and planning, has been a core concept in the opportunities to weather prediction [4].
big data analysis and decisions making. In this paper, we Prediction based on temperature is important to
focus on studying historical weather data from the agriculture and commodity markets. Therefore,
National Oceanic and Atmospheric Administration temperature prediction is used by utility companies to
(NOAA). The data is collected over the span of about 11 estimate demand over coming days. On an everyday
years; from 1997 to 2007. Big Data is a term refer to basis, people use weather forecasts to determine what to
describe the exponential growth for data, both structured wear on a given day [5]. Since outdoor activities are
and unstructured, because data coming from everywhere severely curtailed by wind chill, heavy rain and snow.
such as social media, videos, digital pictures, sensors etc., Moreover; weather forecasting can be used to plan
and that make it difficult to use software tools to capture, activities around these events and to plan ahead and
Analysis, manage and process data within a tolerable survive them. In order to predict weather in a very
elapsed time [1]. Big data have three characteristics high effective way and to help overcome all such problems we
Volume, high Velocity and high Variety. According to have proposed A Prediction Framework of Weather with
Bryson [2] “Weather is the original big data problem”. It Big Data Using MapReduce Algorithm and the advantage
has been discussed earlier though any approach is with big data has over other weather prediction method is
followed; weather forecasting is initial value problem. the big data minimizes the error using various algorithms
Size of initial data increases, accuracy of forecasting and gives us a predicted value which is nearly equal to the
increases [3]. According to Nancy Grady the velocity of actual value.
1
Adv. Sci. Lett. X, XXX–XXX, 2016 RESEARCH ARTICLE

Function. Map Function is the first step in MapReduce


Algorithm. It takes input tasks (Datasets) and divides
The study and knowledge of how weather evolves them into smaller sub-tasks. Then perform required
over time in some location or country in the world can be computation on each sub-task in parallel. This step
beneficial for several purposes. Such information or performs the following two sub-steps: Splitting step takes
knowledge can be used for future predictions. For input dataset from source and divides into smaller Sub-
instance, Forecasts based on temperature and Datasets. Mapping step takes those smaller (Sub-
precipitation are important to agriculture, and therefore to Datasets) and performs required action or computation on
traders within commodity markets. Temperature forecasts each Sub-Dataset. The output of this Map Function is a
are used by utility companies to estimate demand over set of key and value pairs as <Key, Value> as shown in
coming days. On an everyday basis, people use weather Fig.2.
forecasts to determine what to wear on a given day. Since
outdoor activities are severely curtailed by heavy rain,
snow and the wind chill, forecasts can be used to plan
activities around these events, and to plan ahead and
survive them. Extension rule is presented.

2. HADOOP/MAPREDUCE

This section introduces the Hadoop/MapReduce


approaches used in this study, MapReduce and Hadoop
Distributed filesystem, and describes the MapReduce
performance algorithm.

2.1 MapReduce

MapReduce is the programming model that allows


Hadoop to efficiently process large amounts of data [6].
MapReduce breaks large data processing problems into Fig.2. Map Function
multiple steps, namely a set of Maps and Reduces that Shuffle Function It is the second step in MapReduce
can each be worked on at the same time (in parallel) on Algorithm it takes a list of outputs coming from “Map
multiple computers. MapReduce is designed to work with Function” and performs these two sub-steps on each and
of HDFS. Apache Hadoop automatically optimizes the every key-value pair. Merging step combines all key-
execution of MapReduce programs so that a given Map or value pairs which have same keys (that is grouping key-
Reduce step is run on the HDFS node that contains value pairs by comparing “Key”). This step returns <Key,
locally the blocks of data required to complete the step. List<Value>>. Sorting step takes input from merging step
and sorts all key-value pairs by using Keys. This step also
returns <Key, List<Value>> output but with sorted key-
value pairs. Finally, Shuffle Function returns a list of
<Key, List<Value>> sorted pairs to next step.

Fig.1. MapReduce Architecture


2.2 MapReduce Execution Process Steps

MapReduce Algorithm uses the following three main


steps Map Function, Shuffle Function and Reduce Fig.3. Shuffle Function
2
RESEARCH ARTICLE XXXXXXXXXXXXXXXXX

Reduce Function it is the final step in MapReduce


Algorithm. It performs only one step: Reduce step. It
takes list of <Key, List<Value>> sorted pairs from Shuffle
Function and perform reduce operation as shown Fig.4.

Fig.4. Reduce Function


Fig.5. The Proposed Framework

3. EXPERIMENTAL SET UP Fig.5. shows the framework of the modeling system. In


the Framework show in figure 6 when a MapReduce task
The Hadoop tool 2.7.1 is used to carry out the is started, the Master node delegates the tasks to other
analysis of the weather data using MapReduce algorithm. machines and complete overview of what is happening in
The cluster consists of three Linux machines, where the the network. The Master node assigns new tasks to
master (CPU Intel Core i7, RAM 4 GB and HD 1 TB) worker nodes and reassigns tasks that take too long. A
have two mission’s management the cluster and at the sequence overview is as the following steps.
same time working as a slave node (DataNode) and the Step1: The input weather data is split into a number of
other two machines (CPU Intel Core i3, RAM 4 GB and pieces of a specified size (64 MB). The weather algorithm
HD 1 TB). The paper of Koop there is diverse enormous is started on all nodes.
opportunities through forecasting with big data. At
Step2: one node is set to be Master and DataNode starts
present, there is a research expansion into the utilization
of big data for obtaining accuracy in weather forecasting delegating work to other nodes. All pieces created in Step
and the initial results reveals that big data would 1 are first mapped by the mapping function. The number
immensely benefit weather forecasts [7, 8]. Indeed, of reduce tasks at the start should be low.
weather forecasting has been one of the principle Step3: If a worker gets a map task, it runs the map task
beneficiaries of big data; however, the forecasts are still and stores the result in the memory of the machine.
erroneous beyond a week [9]. The questions we have Step4: Periodically these stored results are written to the
attempted to answer are: disk and the Master node is notified of the cation.
Step5: When the Master node gets notified about a
I. The conventional forecasting tools have location of mapped pairs, it will start a reduce test on one
deficiencies in handling speed, size and inherent of the free workers.
complexity of big data. Did these data mean Step6: When a reduce task is called, first of all it fetches
anything to be looking at? the stored results from the remote machine on which the
map task has run. Secondly, these results are sorted by
II. There is no standard of big data weather key. Thirdly, the results are reduced.
forecasting and the growth of big data has Step7: When there are no more data to process, the
significant effects on weather forecasting. What is Master node returns the final results to the user program.
the difficulty behind forecasting with big data? All this time the Master node has an overview of what all
nodes are doing. The master will also re-assign already
assigned tasks to idle nodes, because this might improve
overall performance.

3
Adv. Sci. Lett. X, XXX–XXX, 2016 RESEARCH ARTICLE

4. RESULTS AND DISCUSSION

From the dataset of observations presented, we study


the temperature factors which are important to
agriculture, and therefore to traders within commodity
markets. Temperature forecasts are used by utility
companies to estimate demand over coming days The
Algorithm are consists from two stages a map function
and a reduce function, and when a function called the
below steps of actions take place. The first procedure is to
preprocess the input weather data and split into a number
of pieces of a specified size. The input data are
aggregated from NOAA for the analysis in the process.
The second procedure is mapped by the mapping function
(line 1-6 of Algorithm). In the procedure, the processed Fig.6. Average Temperatures in 2001
data is used to create exact model (key and value), and
then the choosing the temperature independent variable. Fig.6. shows the Average Temperatures trend from
The third procedure is to reduce the key and value (line 7- January to December. It is observed that the Average
12 of Algorithm). In the procedure, the final step in Temperatures from January to June increase gradually.
Reduce the (key, value) Residual for the final model. Meanwhile, almost similar between July to August. In
Then previous procedures are repeated by using addition, dropped slightly from September to December.
transformed variable in the final procedure (line 7-12 of
Algorithm). If not, the final model is returned.

Algorithm:Proccess_weather dataset ________________

The mapper emits an intermediate key-value pair for each


weather file. The reducer sums up all counts for each
temperature.

Input: // Weather dataset


Begin
1: class Mapper
2: method Map(LongWritable, Text, Text, IntWritable)
3: for all Text, IntWritable do
4: Emit(string temp; line 10)
5: if stringUtils is Numeric then Fig.7. Average Temperatures in 2002
6: output (datepart , Temperture)
End.
Fig.7. shows the Average Temperatures trend from
Begin
7: class Reducer January to December. It is observed that the Average
8: method Reduce(Text, IntWritable, Text, IntWritable) Temperatures from January and February are similar.
9: sum 0 Meanwhile increase gradually from March to July. In
10: while all count temp counts [temp1 ; temp2; and addition the lower in December.
tempn] do
11: sum = sum + temp
12: Emit(temp ; count sum)
End.

Fig.8. Average Temperatures for four years


4
RESEARCH ARTICLE XXXXXXXXXXXXXXXXX

Weather forecasting statistics Fig.8. Indicate the annual REFERENCES


pattern of weather forecasting function of months of the
[1] Avita Katal, Mohammad Wazid and R H Goudar. Big Data:
year for the last 11 years. Note that most of the high
Issues, Challenges, Tools and Good Practices” , IEEE, ( 2013)
average Temperature weather forecasting occurs during 978-1-4799-0192-0/13.
the convective season (May, Jun, Jul, Aug, Sep and Oct). [2] Bryson Koeler, CIO Weather Channel,. [Online].
http://www.nasdaq.com/article/big-data-delivers-fewer-
5. CONCLUSION hunchesmore-facts-to-weather-channel-20130211-
00763#.UUxqs1s4VIh.(2013).
[3] Nick Wakeman, Washington Technology.
In study, propagation big data analysis is used for http://washingtontechnology.com/blogs/ editorsnotebook/
predicting the temperature based on the MapReduce 2012/11/sandy-big-data-opportunity.aspx. [Online] (2012).
algorithm to the big data. Through the implementation of [4] Nancy Grady, David Green, Troy Anselmo. Big Data Effects on
this framework, it is illustrated, how an intelligent system Weather and Climate, Informal Discussions on The New
Economics. s.l. : SAIC, (2013).
can be efficiently integrated with big data prediction
[5] Arribas-Bel, Accidental, open and everywhere: Emerging data
framework to predict the temperature. This method sources for the understanding of cities. Applied Geography,
proves to be a simplified conjugate gradient method. forthcoming (2013).
When implementation into the framework in big cluster [6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified
the performance of MapReduce was satisfactory as there Data Processing on Large Clusters, Google, Inc. (2004).
were not substantial number of errors in data processing. [7] Knapp. Forecasting the Weather with Big Data and the Fourth
Weather forecasting with big data approach for Dimension.Availablevia:http://www.forbes.com/sites/alexknapp/
2013/06/13/forecasting-the-weather-with-big-data-and-the-
temperature forecasting is capable of yielding good fourth-dimension/2/.(2013).
results and can be considered as an alternative to [8] Hamm. How Big Data can Boost Weather Forecasting. Available
traditional meteorological approaches. This approach is via: http://readwrite.com/2013/02/28/how-big-data-can-boost-
able to determine the nonlinear relationship that exists weatherforecasting#awesm=~ou64ZEaKe2HtUu. (2013).
between the historical data (temperature, wind speed, [9] Silver. The Signal and the Noise: The Art and Science of
humidity, etc.,) supplied to the system during the training Prediction. Penguin Books, Australia (2013).
phase and on that basis, make a prediction of what the
temperature would be in future.
However; the limitation of this study proposed only
uses the structured data instead of using both (structured
data and unstructured data) for efficiency temperature
prediction accuracy.

View publication stats

You might also like