You are on page 1of 6

2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)

Dehradun, India 14-16 October 2016

Exploratory Data Analysis on Temperature Data


of Indian States from 1800-2013
(Analysis Of Trends In Temperature Data For Prediction Modelling)

Aditya Agrawal Dharvi Verma Shilpa Gupta


Bharati Vidyapeeth’s College of Bharati Vidyapeeth’s College of Bharati Vidyapeeth’s College of
Engineering, New Delhi, India Engineering,New Delhi, India Engineering, New Delhi, India
agrawal.aditya@outlook.com dharvi.verma.in@ieee.org shilpa.gupta@bharatividyapeeth.edu

Abstract—Information about climate changes is required at is a non-profit organization, which analyzes Earth’s Surface
global, regional and basin levels for a variety of purposes, Temperature records and disseminates its scientific
including the study of impact of the greenhouse gases. The investigations to shed light on the plight of global warming, by
analyses mentioned in this research relate to the observation of addressing the concerns of the skeptics. Berkeley Earth’s
trends in the temperatures of the Indian states. The research study on the surface temperatures of Earth consists of more
begins with the exposition of the ongoing analysis methodologies than 1.5 billion temperature reports. The data on global land
prevalent in exploratory analysis and prediction modeling on and ocean-and-land temperatures consists of records which
temperature data. It further develops into the proposed work, enlist statistics for the temperature recorded on the day of
where the analysis of means of the average temperatures
measurement, such as, average temperature of land,
observed across the Indian states from 1800-2013 is summarized,
which in turn is found to reveal confounding results. The
uncertainty over average land temperature, maximum and
proposed work concludes with further focused analysis of minimum temperature, land-and-ocean average temperature
geographically similar states, namely the states lying on the Indo- and its uncertainty. This facilitates an exhaustive analysis of
Gangetic plains, which reveal encouraging results, thereby data which can aid in deciphering trends across regions or
showing an occurrence of a trend. The research concludes with group of regions.
the propounding of the future scope, which includes modeling for Temperature data has been studied in the past to find the
predicting the average temperatures which can be attained over underlying trends which could account for warming. Analyses
the next few decades, which in turn would be significant for the
have been performed over the years to determine metrics such
observation of the corollaries of global warming in India.
as the highest increase in temperatures over a span of time,
Keywords—exploratory analysis; indian states; temperature; regions contributing the most to an increase in the national
ggplot2; r; rstudio; average temperature of a county, probability distribution
(normal, beta, uniform and exponential, etc.) of average
temperatures of a nation, correlation between the solar output
I. INTRODUCTION and the Earth’s surface temperatures, association between the
density and temperature of a country over the years to study
There has been a sizeable uptick in warming and the threat climate change trends, and tendencies in the average
of Climate Change is more startling than it has ever been. temperature uncertainty over the years to check for noisy data,
Scientists have performed exhaustive analyses and developed et cetera.
prediction models to ascertain the rise in temperatures and
The data extracted from Berkeley Earth was extremely
prognosticate how the trends will stand in the decades to
exhaustive, which allowed for fascinating subsets such as
come. A huge amount of data to study climate change has
those arranged by country or city, which in turn enabled an
been collected for approximately a hundred and thirty years,
interesting study on the average temperatures observed in the
with researchers having laid down the very real hazard of
states of India from 1800-2013. RStudio IDE [2], based on R
global warming using solid facts. Examination of Climate
[3] was used for the analysis and the dataset with average
Change trends entails a long-time evaluation of data, the
temperatures, and average temperature uncertainty grouped by
preparation of which is highly tricky. Collection, cleaning,
state was used to extract the data specific to India. Trends, if
preprocessing of this data is complex, which is why survey of
any, existing in temperatures were analyzed for different
this data needs to be very thorough. Mercury thermometers
states. The trends in the temperatures across the states, over
were earlier used for collection, which in turn could give way
the years, threw up some interesting results. Many states were
to dodgy results owing to any variations which could exist in
found to account for the warming trends in different years and
visit time. There was a shift to electronic thermometers in the
with varying degrees. The analysis of the current trends can
80s, which became a more reliable source of data collection.
provide for a rigorous prediction model which in turn can
This paper is based on the exploratory analysis of data predict temperature trends for the coming years depending on
extracted from Berkeley Earth[1] to evaluate trends in the available facts, figures and past patterns.
temperatures of Indian States from 1800-2013. Berkeley Earth

978-1-5090-3257-0/16/$31.00 ©2016 IEEE 547


2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)
Dehradun, India 14-16 October 2016

The paper makes a case for the exploratory data analysis of


the temperature data, for the states of India from 1800-2013.
After an emphasis on the related work in this domain, the
problem statement has been described thoroughly. The fourth
section highlights the proposed work and provides illustrations
to corroborate the trends mentioned in the section. This is
followed by a section underscoring the future scope of this
analysis to extend its capabilities and thereby derive a huge
number of exploitable results. The paper is concluded with a
summarization of the analysis performed, the trends exhibited
and the irregularities observed. A proposal for the
development of a sound prediction model is then advocated.

II. RELATED WORKS

A. Analyzing the Average Annual Temperature Increase for


every city over the 1800-2013 period

Implemented by Ben Watson[4], this methodology


analyzes the average annual increase in temperature for each Fig. 2. Average Annual Temperature Increase in June
month for every city on the planet from 1800-2013. The
dataset in which global temperatures were arranged according
to cities was selected, some data points were created and meta-
data sets of each city were taken. Data table was created using B. Analyzing the change in land temperature
the package data.table [5] to facilitate quicker subsetting. Implemented by J. Alejandro Gelves[6], this methodology
Maps were plotted to show the trends for every month. This maps the change in land temperature for different regions. He
methodology provides highly informative and systematic has selected the dataset of global temperatures by state and
visuals. Analyzing twelve graphs, one can ascertain the chosen the United States for analysis of change in land
climatic trends across various regions of the world, and derive temperature in continental US. The author groups the data by
conclusions according to the problem statement. year to find average temperature for each year. The author
Fig. 1 and Fig. 2 show the average annual increase, as a organized the data into data frames and obtained the
sample, in temperature for the months of January and June, as choropleth maps for the years 1850 and 2013.
plotted by Ben Watson for his analysis of the dataset. Fig. 3 and Fig. 4 were obtained by J. Alejandro Gelves.
Fig. 3 illustrates the choropleth map of land temperatures for
different states of US in year 1850.

Fig. 1. Average Annual Temperature Increase in January

Fig. 3. Land Temperatures in some states of USA in 1850

Fig. 4 Choropleth map of land temperatures for


different states of US in year 2013 .

548
2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)
Dehradun, India 14-16 October 2016

regions, which in turn will help in predicting the impact of the


greenhouse gases as well as provide the stimulus for the
countermeasures required for palliating the effects of global
warming. These analyses and prediction models will also lay
down a pathway for similar research on other geographically
related areas.
IV. PROPOSED WORK
The work proposed in this analysis takes inspiration from the
work done by J. Alejandro Gelves[6]. The data that was
obtained from Berkeley Earth was put through preprocessing
procedures, which yielded the complete dataset of Indian
states. The dplyr[9] package was instrumental in the
processing of the data and also in further analysis. The
processed data that was finally obtained contained the
following fields:
Fig. 4. Land Temperatures in some states of USA in 2013
1. Date: String value of date which ranged from 1st of
C. Methodology to detect monotonic trends in annual average every month of 1796 to the 1st of every month of
temperatures using Mann-Kendall test 2013.
2. AverageTemperature: Average land temperature in
Manohar Arora et al investigated trends in temperature Celsius.
time series of 125 stations spread across India [7]. The 3. AverageTemperatureUncertainty: Provides the 95%
Mann-Kendall test, which is a non-parametric test used to confidence interval around the average temperature
assess monotonic upward and downward trends, was (AverageTemperature). Gives a measure of the
applied. Mean, mean maximum and mean minimum were uncertainty.
the variables considered for analysis, which was carried 4. State: Gives the name of the state of India for which
out on both annual temperatures and seasonal data is provided.
temperatures. The seasonal analysis, by which every year 5. Country: This field is constant and set to the string
was divided into four seasons namely, pre-monsoon,
value of ‘India’.
monsoon, post-monsoon and winter, allowed for a study of
trends observed in each season for every parameter. Trends
such as annual trends, regional trends, seasonal trends and This dataset was put through further processing scripts for
percentage of significant trends, etc. were examined. obtaining the Date value in numeric format with separate
fields for Date, Month, and Year; as the Date value were
D. Analysis of trends in temperature using Artificial Neural earlier contained in string format. This processing was done
Network using the tidyr[10] package for the RStudio, where the
separate() function was used for performing the above task
The climate change data is often irregular and non-linear in efficiently. The data in the obtained format contained 81,620
nature. This has led to the application of Artificial Neural row entries over 7 column fields, i.e. Date, Month, Year,
Network to aid in the analysis of weather data and the
Average Temperature, AverageTemperatureUncertainty, State
development of reliable non-linear predictive models, by
K. Abhishek et al[8]. The structural relationships between and Country.
different entities were demonstrated using the features of
the artificial neural networks which in turn facilitates an Upon further inspection of the data, using the unique function
effective analysis of trends existing in temperature data as of Base-R package, it was revealed that the number of states
well as the setup of an efficient weather forecasting model. being accounted for in the data was 34, which included the 28
states (there were 28 states at the time of collection/estimation
III. PROBLEM STATEMENT
of data) and 6 Union Territories, namely - Chandigarh, Dadra
and Nagar Haveli, Daman and Diu, Pondicherry, Andaman
And Nicobar, and Delhi. It was decided as some of the Union
Analysis of the trends of temperatures occurring across India Territories do not occupy significant landmass and will not
has been a common question for the researchers, but few contribute significantly to the analysis, they were deemed as
researches have been focused on finding trends with the aim redundant.
of predicting the average temperatures which will be prevalent All the Union Territories, except Delhi, were removed from
in the next few decades. The analysis of geographically the dataset for the sake of a focused analysis. Filter function
similar regions can evince significant results which will play a of the dplyr[9] package was used for the above removal of the
major role in the prediction of average temperatures that will Union Territories. All the above procedures, in conjunction,
be attained in these regions. These predictions will help in yielded the final processed data for analysis.
gauging the rate at which the climate is heating up in these

549
2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)
Dehradun, India 14-16 October 2016

A. Analysis of the data for 29 states (including Delhi)

The data obtained after the above procedures, was pipelined


through a series of functions, provided by dplyr package that
served the following functionalities:

1. Filter: for filtering the data before 1800 out of the


provided dataset and obtaining the data from 1800-
2013.
2. Group_by: for grouping of the data on the basis of
the years, which is required for the summarise
function.
3. Summarise: this function was used for summarizing
the mean of the AverageTemperature of the states for
a year into a new field variable, named Temp.
Summarise function applies the similar procedure on Fig. 6. Average Temperature of India at interval of 50 Years.
every group provided by the group_by function and
thus a list means for every year was obtained into the B. Analysis of the data for states around the Indo-Gangetic
field Temp. region (including Delhi)

The Temp variable and the years were plotted using the qplot Inconsequential results were obtained from the analysis of all
function of the ggplot2[11] package, for R Studio. The scatter the states of India that was correlated with the diversity of
and line plot, Fig 5, show a visible dip in the average India, where significant change in any particular geographical
temperatures of the Indian states under consideration. This dip region caused a change in the averaging of all the
was deemed as an oddity in analysis, mostly accounted to the geographical regions taken as unity. To eradicate the chaining
huge diversity in the states of India, which range from the sub- effect of changes in different geographic systems, the research
freezing temperatures of Jammu-Kashmir, to the extremely was focused on the analysis of a particular geographically
hot conditions prevalent in Rajasthan. Further, the root-mean- similar region, namely- the states in the Indo-Gangetic region.
square value of the deviations for the period of 1800~1850
will evidently be very high as compared to the rest of the plot. The following states were decided for further focus of the
The boxplots that were obtained for the interval of 50 years analysis- Delhi, Bihar, Uttar Pradesh, Madhya Pradesh,
i.e. for years – 1800, 1850, 1900, 1950 and 2000 also revealed Haryana, Chhattisgarh, Jharkhand, and West Bengal. Filter
similar results. Discrete values of mean temperatures of the function was used for filtering out the data of the
states for the respective year have been considered for the aforementioned states from the data of all the states of India,
box-plot, which will be further analyzed to consider mean as used in previous section.
values of a period of years. This analysis visual was also
obtained in a similar manner using the qplot function of
ggplot2 package.

Fig. 7. Average Temperature of states of Indo-Gangetic region (1800-2013).


Fig. 5. India Average Temperature (1800-2013).

550
2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)
Dehradun, India 14-16 October 2016

The pipelining of the thus obtained data from the series of This analyses allows the scope of prediction of future
functions, as done earlier, was repeated, i.e. - the data was temperature recordings which will have wide-ranging impact
filtered, grouped by, and then summarized using the functions on the way the rise in temperatures is viewed, as a disastrous
filter, group_by, and summarise. increase will need to be avoided and if the prediction model
predicts such catastrophic temperatures, then measures would
The qplot of the data obtained in a similar manner to that done need to be adopted for thwarting the occurrence of the same.
for all the states of India revealed encouraging results and a
regular trend, as shown below. Fig 7 points in the direction of
a trend in the increase of the average temperatures of the states V. FUTURE SCOPE
which belong to the Indo-Gangetic region, which share many The analysis of the states around the Indo-Gangetic plains
geographical characteristics. The root-mean-square values of reveals a steady rise in the means of the averages of the
the scatter plot around the trend-line would also be less, as the temperatures across these states, namely – Delhi, Bihar, Uttar
average temperature readings are condensed around a uniform Pradesh, Madhya Pradesh, Haryana, Chhattisgarh, Jharkhand,
trend. and West Bengal, over the years, from 1800-2013.
This trend can be mapped onto a linear regression model,
This exploratory analysis of the data of the states around the or other apposite prediction model can be developed which
Indo-Gangetic region was further improved and depicted in a will provide us with the probable average temperatures that
plot of the boxplots of the average temperatures at an interval these regions can encounter over the next few decades. This
of 40 years, for the period of 1800-2000. The box-plots will facilitate the study of temperature trends, which may
consider single values of all the occurring mean temperatures occur in the plains of India. It will help in gauging the impact
for the states in the particular year, as depicted by the abscissa. of global warming and foment countermeasures for checking
Further analyses are proposed for the future, which will the same. Further study over the same would be done for
provide comprehensive detail by accounting for mean values provisioning the average temperature range that these states
for a time span of a few decades. The work done by David A. would face during the decade of 2040-2050.
Abdo et al [12], S. Kundu et al [13] and A.Ghosh et al [14] These analyses can also be carried out over other
inspired us in the search of these trends. geographically similar areas, which will further provide
similar or improved insights, as are being observed from the
The boxplots visible in the Fig 8 also point in the same Indo-Gangetic states in India, for studying and researching
direction, as done by the scatter-line plot of the average into the information which can be provisioned from the
temperatures of the states around the Indo-Gangetic region. available data.
These plots show the potential of the development of a
suitable prediction model, which will be capable of predicting
the average temperatures of these states, which will be in
conjunction with the rise depicted in the given plots. VI. CONCLUSION
The above research studies the methodologies prevalent
for the modeling of the temperature trends, from the wrangled
temperature data and also provides analyses of the temperature
trends which were prevalent in specific geographically similar
regions of India, during the period of 1800-2013. Different
regions, owing to different geography and different
environmental conditions show different changes in
temperature data, with some regions facing the brunt of global
warming more than others. Similar analysis of the average
temperatures of all the states of India, taken as a whole, did
not allow for reasonable analysis of trends. But, the visualized
temperature data of the Indo-Gangetic regions exhibited trends
which allow the scope for modeling, which will further the
scope of this research and allow for the development of
suitable prediction model for predicting the average
temperatures for certain regions of India. Such a model could
aid in outlining the major regions where the maximum
increase in temperature may be observed, which would in turn
yield in scientific research on the specific causes and the
consequences of the said increase. Study of trends which may
occur in the future could be used as an effective tool to further
scientific literature in the domain of climate change.

Fig. 8. Average Temperature of Indo-Gangetic region at interval of 40 years.


(1800-2000).

551
2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016)
Dehradun, India 14-16 October 2016

REFERENCES [8] K. Abhishek, M. Singh, S. Ghosh and A. Anand, "Weather Forecasting


Model using Artificial Neural Network", Procedia Technology, vol. 4,
pp. 311-318, 2012.
[1] "Home - Berkeley Earth", Berkeley Earth, 2016. [Online]. Available: [9] H. Wickham, R. Francois (2016). dplyr: A Grammar of Data
http://berkeleyearth.org/. Manipulation. R package version 0.5.0. http://CRAN.R-
[2] RStudio Team (2015). RStudio: Integrated Development for R. RStudio, project.org/package=dplyr.
Inc., Boston, MA URL http://www.rstudio.com/ [10] H. Wickham (2016). tidyr: Easily Tidy Data with `spread()` and
[3] R Core Team (2013). R: A language and environment for statistical `gather()` Functions. R package version 0.6.0. http://CRAN.R-
computing. R Foundation for Statistical Computing, Vienna, Austria. project.org/package=tidyr.
URL http://www.R-project.org/ [11] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-
Verlag New York, 2009.
[4] B. Watson, "Climate Change: Earth Surface Temperature Data |
Kaggle", Kaggle.com, 2016. [Online]. Available: [12] D. Abdo, L. Bellchambers and S. Evans, "Turning up the Heat:
https://www.kaggle.com/benwatson/d/berkeleyearth/climate-change- Increasing Temperature and Coral Bleaching at the High Latitude Coral
earth-surface-temperature-data/is-your-city-getting-warmer/code. Reefs of the Houtman Abrolhos Islands", PLoS ONE, vol. 7, no. 8, p.
e43878, 2012.
[5] M Dowle, A Srinivasan, T Short, S Lianoglou with contributions from R
Saporta and E Antonyan (2015). data.table: Extension of Data.frame. R [13] S. Kundu, D. Khare and A. Mondal, "Future changes in rainfall,
package version 1.9.6. http://CRAN.R-project.org/package=data.table. temperature and reference evapotranspiration in the central India by
Least Square Support Vector Machine", Geoscience Frontiers, 2016.
[6] J. Alejandro, "Climate Change: Earth Surface Temperature Data |
Kaggle", Kaggle.com, 2016. [Online]. Available: [14] A. Ghosh and P. Joshi, "Hyperspectral imagery for disaggregation of
https://www.kaggle.com/jagelves/d/berkeleyearth/climate-change-earth- land surface temperature with selected regression algorithms over
surface-temperature-data/continental-us-climate-change-1850-2013. different land use land cover scenes", ISPRS Journal of Photogrammetry
and Remote Sensing, vol. 96, pp. 76-93, 2014.
[7] M. Arora , N. K. Goel & Pratap Singh (2005) Evaluation of temperature
trends over India / Evaluation de tendances de température en Inde,
Hydrological Sciences Journal, 50:1, -93, DOI:
10.1623/hysj.50.1.81.56330.

552

You might also like