BTP Report

Supervised Predictive Modeling Of Space Weather Events
Using
Remote Sensing Observations
PROJECT REPORT
Submitted in partial fulfillment of the requirement for the award of the degree of
Bachelor of Technology
In
Engineering Physics
By
Aman Kumar
Enrollment Number: 17122005
&
Aniket Sujay
Enrollment Number: 17122006
Under the Supervision

of
Prof. M.V. Sunil Krishna
DEPARTMENT OF PHYSICS
INDIAN INSTITUE OF TECHONOLGY ROORKEE
ROORKEE- 247667 (INDIA)
18TH June 2021
CANDIDATE’S DECLARATION
We hereby declare that the work which is being presented in this B.Tech Project
report entitled “Supervised predictive modeling of Space Weather Events Using
Remote Sensing Observations” in partial fulfillment of the requirements for the
award of the Degree of Bachelor of Technology in Engineering Physics and
submitted in the Department of Physics of the Indian Institute of Technology
Roorkee, is an authentic record of our own work carried out during the period of
August 2020 to June 2021 under the supervision of Prof. M.V. Sunil Krishna,
Department of Physics, Indian Institute of Technology, Roorkee.
Aman Kumar
B.Tech Engineering Physics
Enrollment No: 17122005
Aniket Sujay
B.Tech Engineering Physics
Enrollment No: 17122006
CERTIFICATE
This is to certify that the project entitled “Supervised predictive modeling of

Space Weather Events Using Remote Sensing Observations” is a work carried
out by Aman Kumar & Aniket Sujay B.Tech fourth year, IIT Roorkee in the partial
fulfillment for the requirements of the award of the degree of Bachelor of
Technology in Engineering Physics under my guidance.
Prof. M.V. Sunil Krishna
Associate Professor
Department of Physics
Indian Institute of Technology, Roorkee
ACKNOWLEDGEMENTS
We wish to extend sincere thanks and a deep sense of gratitude to our

supervisor Prof. M.V. Sunil Krishna, Associate Professor at Department of
Physics, IIT Roorkee, for his guidance, encouragement, and suggestions
throughout the course of this work. His experience and thoughtful ideas
gave the right direction to our project work.
We owe our deep sense of gratitude to Prof. G.D. Verma, Head of

Department of Physics, Indian Institute of Technology, Roorkee for
providing us infrastructure facilities to complete this project successfully.
We would also like to thank Mr. Alok Kumar Ranjan, Research Scholars at
Department of Physics, IIT Roorkee who helped us throughout the course
of this project.
Introduction
NASA’s TIMED Mission
TIMED stands for Thermosphere Ionosphere

Mesosphere Energetics & Dynamics. Launched
on 7th December 2001, this satellite is orbiting
around the earth at an altitude of 625 km from
the surface. It studies our atmosphere’s MLTI
region (Mesosphere, Lower Thermosphere /
Ionosphere).
There were several motives behind studying this
region of the atmosphere:
● Intense solar activity dumps large Fig 1: TIMED spacecraft collecting data about the
amounts of energy into MLTI, causing it MLTI region
to expand and reach further out into
space. As a result, satellites orbiting in
SABER
low-Earth orbits encounter more air
particles which increase the drag and
SABER stands for Sounding of Atmosphere
reduce their orbital velocity. So
using Broadband Emission Radiometry. It is a
ground-based controllers have to
multichannel infrared radiometer that measures
relocate them into their new orbits.
a wide range of infrared light emitted by different
● To understand the energetics and
molecular species in the atmosphere at different
dynamics of this region. Studying this
altitudes. It also measures the heat emitted by
region will help in understanding the
the atmosphere over a broad altitude range and
process behind the change in chemistry,
spectral range, as well as global temperature
dynamics, and electrical properties of
profiles and sources of atmospheric cooling. Its
the upper atmosphere caused by energy
main objective is to explore the MLTI region to
and energetic particles coming from the
determine its energy balance, atmospheric
Sun.
structure, chemistry, and dynamics between
This will help scientists to develop predictive
atmospheric regions.
models of space weather’s effect on satellite
SABER achieves all this with global
tracking, communications, spacecraft lifetimes,
measurements of the atmosphere using a
degradation of spacecraft’s materials upon
10-channel broadband limb-scanning infrared
orbiting & reentering the Earth’s atmosphere,
radiometer covering the spectral range from
etc.
1.27 µ𝑚 to 17 µ𝑚.
The TIMED satellite does all of this with the help
SABER measures 2 out of 3 key Infrared
of 4 special instruments mounted on it.
emissions that govern radiative cooling of the
● Global UltraViolet Imager (GUVI)
atmosphere at high altitudes (above 100 km):
● Solar Extreme ultraviolet Experiment
● NO at 5.3 µ𝑚
(SEE)
● TIMED Doppler Interferometer (TIDI) ● CO2 at 15 µ𝑚
● Sounding of Atmosphere using
Broadband Emission Radiometry All data products of SABER are divided into 2
(SABER) categories: “Routine Products” and “Analysis
Products”.
Routine products of SABER includes: heat balance occurs via infrared emissions by
● Ozone Concentration (15-100 km from various molecular species. During solar storms,
9.6 µ𝑚), day and night radiative cooling accounts for the dissipation of
● Ozone Concentration (50-105 km from approx 80% of Joule Heating and energy input.
1.27 µ𝑚), daytime only The NO infrared emission at 5.3µ𝑚is a
● Water vapor (15-80 km) day and night. dominating heat balance process in the
● NO volume emission rate at 5.3µ𝑚 atmosphere. This emission results due to the
(80-100 km), day and night vibrational-rotational band transition
● OH Volume Emission Rate at 1.6µ𝑚 and (Δ 𝜈 = 1, Δ j = 0, ±1)
2.0µ𝑚(80-100 km), day and night
● O2(1Δ) volume emission rate at 1.27µ𝑚 NO (X2Π, 𝜈=1) → NO (X2Π, 𝜈=0) + h𝜈 (5.3 μm)
(50-100 km), day and night
● CO2 (4.3µ𝑚& 15 µ𝑚) calibrated limb This vibrational transition acts as an important
radiance profile cooling process in the 110-300 km altitude
● Calibrated limb radiance profiles for all range. The cooling process is due to the
remaining channels conversion of kinetic energy into radiative
● Kinetic Temperature, pressure & density energy, which is subsequently released into
(10-105 km), day and night space and the lower atmosphere. The dominant
source for the production of this emission is the
Analysis Products of SABER includes: collisional vibrational excitation of NO by the
● Constituent Abundances: impact of atomic Oxygen.
○ CO2(100-160 km)
○ H (80-100 km)
NO (𝜈 = 0) + O α1→ NO* (𝜈 = 1) + O
○ O (80-100 km)
The NO emission rate is severely influenced by
● Cooling Rates:
storm conditions. It was observed that the
○ CO2 (15µ𝑚)
radiative emission rate during a storm period in
○ NO (5.3µ𝑚)
the Lower Thermosphere region is larger by a
○ O3 (9.6µ𝑚)
few orders of magnitude over the quiet period.
○ H2O (6.7µ𝑚 & far IR)
● Chemical Heating Rates (Odd-Oxygen
and Odd-Hydrogen families)
● Solar Heating rates of CO2, O3, and O2
The combined set of SABER data products will

give a nearly complete measure of solar &
chemical heating, infrared radiative cooling
along with temperature and water vapor &
Ozone concentrations. Our predictive model is
trained on this high-quality data obtained from
SABER.
Radiative Cooling in MLTI Fig 2: NO VER v/s Altitude Plot

During Solar storms, large amounts of energy
and particles are deposited in Earth’s The NO emissions at 5.3µ𝑚 act as a natural
atmosphere. This results in some alteration in thermostat in the Thermosphere. The NO
the atmospheric structure, dynamics, and radiative flux is a potential candidate to
composition. The energy deposited during this understand the Thermospheric modulation
time is lost rapidly from the atmosphere and caused by space weather events.
that is best represented by a non-linear
Objective description.
The information content that goes through the

Objective of our project is to develop predictive
ANN graph passes through a linear combination
AI models using Artificial Neural Networks to
of non-linear functions(called activation function)
predict:
which makes them ideal for the study of
● NO Infrared Radiative flux. (NO IRF)
nonlinear systems.
● Mesospheric Ozone Density
NO Infrared Radiative flux is calculated by Regression analysis procedure

integrating NO Volume Emission Rate along
altitude from 90-250 km. In regression, we start with an assumption about
Ozone Density is calculated by multiplying the form of the regressor function. This function
atmospheric density and Ozone mixing ratio is parameterized.
calculated by 9.6µ𝑚 emission. This density is By adjusting the parameters we can have
measured in the altitude range 70-110 km. different functions. The problem is to find the
best set of parameters that will give the least
amount of error provided a set of examples
Predictive Modelling (data).
We feed the input vectors(features) iteratively
and calculate the result.
Then the parameters are adjusted by some
Regression Analysis factor of the loss function.
The loss function is a metric that determines the
It is a statistical process of estimating a difference between the predicted result and
relationship between a dependent variable and a actual value.
set of independent variables.3 Our job is to minimize this function.
Regression analysis is primarily used in the field We will look at linear regression and then
of prediction and forecasting. Its use has expand the same concepts to ANNs.
substantial overlap with the science of Machine
Learning. Assuming the number of data points is N. Each
Regression analysis can also be used to infer data point is a p-dimensional vector and the
relationships between dependent and output is a scalar value.
independent variables.
[ ]
𝑋1 = 𝑥10 ........ 𝑥1𝑝 be an input vector.
Artificial Neural Networks
[
𝑋 = 𝑋1 ... 𝑋𝑁 ]𝑇 be the matrix representing
Numerous statistical regression models such as the whole dataset.
Linear regression, Polynomial regression,
Support vector regression(SVR) already exist Step 1: Start with an assumption about the form
and are in wide use. But none of the above has of the regression equation. (Here linear)
been as versatile and powerful as the artificial
𝑇
neural networks. 𝑓(𝑥) = 𝑋 β where β is the parameter we have
ANNs have been proven to work excellently in a to adjust.
wide range of problems including regression
analysis. This is due to most if not all Step 2: Run the data through the function and
observations in real life coming from a system calculate the result.
Step 3: Choose a loss function and then
minimize it.
Loss function: Expected error
𝑇 2
𝐸𝑃𝐸(𝑓) = (𝑌 − 𝑋 β)
To minimize the loss function:
∂𝐸𝑃𝐸
∂β
= 0 Fig 3: Artificial Neural Network architecture
Which will give us: We have a 3 X 3 X 1 architecture ANN. The first

stack(layer) of nodes is called the input layer.
𝑇 −1 𝑇 The number of nodes in this layer is equal to the
β = (𝑋 𝑋) 𝑋 𝑌 input parameters.
The hidden layer is what represents the

So our final equation will be:
regression function. This can be arbitrarily large.
Bigger neural networks are often called Deep
𝑇
𝑓(𝑥) = 𝑥 β Neural Networks.
For the output layer, we only have one node.

General ANN architecture for
This is because we are dealing with single
regression analysis variable regression.
Any kind of neural network will consist of three

parts: Training a Neural Network
1. Input Layer: These nodes feed the data
to the computational network. Neural Network Description:
2. Hidden Layers: These layers consist of
linear combinations of nonlinear We will consider a 3 X 3 X 1 architecture.
functions. Input vectors will be of the form:
3. Output Layer: In the case of
single-variable regression we need only
one node.
[
𝑥 = 𝑥1 ... 𝑥𝑝 ]
As neural networks model non-linear systems
The learning procedure for ANNs will be similar we must introduce non-linearity in the graph.
to linear regression, they are: To do this we assign each connection between
1. Forward Propagation nodes a weight.
2. Loss calculation
3. Back Propagation Insert the weight matrix
For the weight between the input and the first

layer we have
Step 3: Backpropagation
Following are the backpropagation equations.

These equations move the model towards the
local minima in the error surface. This technique
is called Stochastic gradient descent.
If we just insert the linear combination of these
∂𝑅
weights with the input to the succeeding nodes η𝑖 = η𝑖 − α ∂η𝑖
we will essentially get a linear model. So we
feed the output from the nodes through a
nonlinear function called activation functions. ∂𝑅
β𝑖𝑗 = β𝑖𝑗 − α ∂β𝑖𝑗
𝑇
𝑧 = σ(𝑊𝑥 + 𝑊𝑜)
∂𝑅
Where σ is the non-linear activation function. 𝑤𝑖𝑗 = 𝑤𝑖𝑗 − α ∂𝑤𝑖𝑗
Step 1: FeedForward Propagation We now apply chain-rule to calculate the

derivatives:
We present the equation for a single pass of the
forward propagation. ∂𝑡𝑖
∂𝑅 ∂𝑅
∂β𝑖𝑗
= ∂𝑡𝑖
* ∂β𝑖𝑗
For the first layer:
4 ∂𝑅 ∂𝑅 ∂η𝑖
= *
𝑧𝑖 = σ( ∑ 𝑤𝑖𝑗𝑥𝑗 + 𝑤𝑖0) ∂𝑡𝑖 ∂η𝑖 ∂𝑡𝑖
𝑗=1
∂𝑅 ∂𝑅 ∂𝑧𝑖𝑘
For the second layer
∂𝑤𝑖𝑗
= ∂𝑧𝑖𝑘
* ∂𝑤𝑖𝑗
3
3
𝑡𝑖 = σ( ∑ β𝑖𝑗𝑤𝑖 + 𝑤𝑖0) ∂𝑅 ∂𝑅 ∂𝑡𝑙
𝑗=1 ∂𝑧𝑖𝑘
= ∑ ∂𝑡𝑙
* ∂𝑧𝑖𝑘
For the output layer 𝑙=1
Using these equations we can calculate the
3 adjustments in the parameters.
𝑦 = ∑ η𝑗𝑡𝑗
𝑗=1 Performance Tuning
Step 2: Calculation f the cost function The performance of an ANN depends on quite a
few parameters. Some of the most common are:
For our regression task, we will use the square
error loss function. 1. Size of the dataset: In the field of neural
𝑁 networks more data is always better.
2 More data points will allow a better fit
𝑅 = ∑ (𝑦𝑖 − 𝑓(𝑥𝑖)) 2. The dimensionality of the input variable:
𝑖=1 High dimensional data is difficult to train
as they require more data points to Architecture
properly fit the model.
3. Learning Rate: This determines the
amount by which we adjust our
parameter value during
backpropagation.
4. Optimizer functions: There are many
optimizer functions apart from the vanilla
gradient descent.
Ex: Adam optimizer, RMS propagation
which could prevent the model from
being stuck in an un-optimal position in
the error surface.
5. Batch Normalization: Neural Networks Fig 4: Autoencoder architecture
are notoriously difficult to train. The
training gets affected by the Autoencoder network has 3 parts:
randomness present in the data or the 1. Encoder network
randomness of the initialization of the 2. Bottleneck
parameters. This is called internal 3. Decoder network
covariate shift. To reduce this we apply
batch normalization techniques to the The encoder network’s job is to
entire dataset. This fixes the means and compress/encode the data into a
variances of each layer’s input. lower-dimensional form.
The bottleneck layer represents the compressed
Autoencoders data.
And the decoder network’s job is to
decompress/decode the compressed data.
Recently people have been highly successful in
applying encoding networks to traditional
machine learning problems.
Training
As the name suggests this type of neural
network can train on the dataset and can learn To train an autoencoder we set the output
how to compress the data in lower dimensional variable to be the same as the input variable.
form. This will force the network to mimic the input
As the lower-dimensional form still preserves the data. The encoder network tries to find a
information content of the original dataset we suitable compression and the decoder network
can train another model with the compressed tries to convert the compressed form back to the
data as the input. It has been shown that this original. The loss function is the difference
method sometimes increases the performance between the input and the output from the
from the traditional approach. decoder.
After training, we can isolate the encoder
network and the bottleneck layer. This neural
network will act as a compression model. Now if
we feed new data it will give a pretty reasonable
encoding for it.
We now hook up the bottleneck layer as input to
another neural network(or some other statistical
regression model) and train this new
combination using the original output.
based on this. The intensity of colors shows the
magnitude of the correlation coefficient.
Highly correlated features are redundant and

create hurdles in training the model. Because of
them, the dimensionality of our dataset
increases unnecessarily. So it is necessary to
eliminate these highly correlated features from
our dataset.
We kept the threshold value to be 0.74. Features
having a magnitude of correlation coefficient
Fig 5: Feeding the output of Encoder to a Neural higher than 0.74 are dropped from the dataset.
Network
Dataset Description
Our Predictive model is trained on the dataset

obtained from SABER and WDC Kyoto website.
● From SABER :
○ Event, Solar AP, Solar KP, Solar
F10.7 Index, Solar Zenith Angle,
Time, Latitude, Longitude,
Altitude, Kinetic Temperature
○ NO Volume Emission rate,
Atmospheric Density, Ozone
Mixing ratio at 9.6 & 1.27 µ𝑚.
● From WDC Kyoto :
Fig 6: Heat Map
○ DST Index, AE Index &
Symmetric H component.
Dropped features are:
DST Index is having a 1-hour resolution, AE ● event
Index & Symmetric H components are having a ● solKp (Solar Kp)
1-minute resolution. These three are combined ● sym_h (Symmetric H Component)
with the SABER data with the help of their time
values. Data Preprocessing:
Data of the months of October 2003, November

Before Feeding data to the model, some
2003 & November 2004 is used, because of the
preprocessing is done to facilitate the learning.
events of solar storms during these months.
The data is divided into 3 sets: Train Set,
Validation Set & Test Set.
Heat Map ● Train Set: It contains the data on which
our model is trained.
To see the correlation between our features we ● Validation Set: It contains the data on
have used a Heat Map. We have calculated the which our model is tested during
correlation coefficients between our features and training.
form a matrix, we then created a heat map ● Test Set: It contains the data on which
our fully trained model is tested. It helps
us to measure the performance of our dataset. Here we used Adam and RMSpros to
model on unseen data. optimize the error function.
The most popular method of evaluating the
The next step is Data Scaling. It is done to bring performance of a regression model is the
all the data in the same range. This is important R-Squared metric.
because if it is not done our model will give more
importance to the features having large values
and will neglect the features with small values.
We have used the MinMax Scaler to scale our
data.
Latent state representations which were learned

by our AutoEncoder are used to encode our
data. This encoding ensures the optimal
representation of our data. Finally, this encoded
data is used to train our predictive model.
Fig 7: MSE Plot during training
2
Calculation of NO_IRF(Infrared ∑(𝑦𝑖 − 𝑦𝑖)
2
Radiative Flux): 𝑅 = 1 − 2
∑(𝑦𝑖 − 𝑦)
SABER dataset provides us with NO volume
emission rate. This is integrated along the This value lies in [0, 1].
altitude to get the NO_irf value. So we integrated
the NO_ver value per event from 90 km to 250
km. We take the average for the latitude,
longitude, and time variables.
This shows that our model can convincingly
Features used for training the model: calculate no_irf values.
1. ae_index(Since sym_h index showed a
high correlation with this, it was
dropped).
2. dst_index
3. solAP
4. solKP
5. solf10p7Daily
6. time
7. tpSolarZen
8. tplatitude
9. tplongitude
10. tpaltitude Fig 8: Actual v/s Predicted values plot
Results:
The test and training set error goes down

exponentially with each epoch through the
Fig 9: Predicted values and Actual values
plotted for 50 data points from test set Fig 10: MSE Plot during training
Calculation of Ozone Density: The most popular method of evaluating the

performance of a regression model is the
R-Squared metric. This value represents the
SABER dataset provides atmospheric density portion of the variance for dependent variables
and Ozone mixing ratio at 9.6 µ𝑚. The Ozone that is explained by the variables in the
density is obtained by multiplying them both in a regression model. This value lies in range [0,1].
70-110 km altitude range. Values close to 1 are considered pretty good.
Features used in training the model:

1. Date
2. solAp This value shows that our model is performing
3. solf10p7Daily very well in calculating Ozone density.
4. time We can also see that with the graph of actual
5. tpaltitude and predicted O3 Density for 100 data points in
6. tplatitude our test set.
7. tplongitude
8. tpSolarZen
9. ktemp
10. ae_index
11. dst_index
Results:
The test and training set error goes down
exponentially with each epoch through the
dataset. Here Adam and RMSpros we used to
optimize the error function.
Fig 11: Actual v/s Predicted values plot

Fig 12: Predicted values and Actual values
plotted for 100 data points from test set
References:
1. Fig 1 Source: NASA Facts

FS-2001-09-026-GSFC
2. Fig 2 source: Gaurav Bharti, M. V. Sunil
Krishna , T. Bag and Puneet Jain. Storm
Time Variation of Radiative Cooling by
Nitric Oxide as Observed by TIMED -
SABER and GUVI
https://doi.org/10.1002/2017JA024576
3. http://wdc.kugi.kyoto-u.ac.jp
4. http://saber.gats-inc.com/
5. https://www.researchgate.net/publication
/322878526_Storm-time_variation_of_ra
diative_cooling_by_Nitric_Oxide_as_ob
served_by_TIMED-SABER_and_GUVI_
STORM-TIME_VARIATION_OF_NO_R
ADIATIVE_EMISSION
6. https://doi.org/10.1016/S0273-1177(
97)00769-2
7. https://doi.org/10.1016/0273-1177(9
5)00739-2
8. https://www.nasa.gov/centers/goddard/p
df/110905main_FS-2001-09-026-GSFC-
TIMED.pdf
9. https://citeseerx.ist.psu.edu/viewdoc/do
wnload?doi=10.1.1.458.4175&rep=rep1
&type=pdf
10. https://keras.io/
11. https://machinelearningmastery.com/aut
oencoder-for-classification/

BTP Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BTP Report

Uploaded by

Copyright:

Available Formats

Supervised Predictive Modeling Of Space Weather Events

Under the Supervision

This is to certify that the project entitled “Supervised predictive modeling of

Prof. M.V. Sunil Krishna

We wish to extend sincere thanks and a deep sense of gratitude to our

We owe our deep sense of gratitude to Prof. G.D. Verma, Head of

NASA’s TIMED Mission

TIMED stands for Thermosphere Ionosphere

The combined set of SABER data products will

Radiative Cooling in MLTI Fig 2: NO VER v/s Altitude Plot

The information content that goes through the

NO Infrared Radiative flux is calculated by Regression analysis procedure

Loss function: Expected error

To minimize the loss function:

Which will give us: We have a 3 X 3 X 1 architecture ANN. The first

The hidden layer is what represents the

For the output layer, we only have one node.

Any kind of neural network will consist of three

For the weight between the input and the first

Following are the backpropagation equations.

Step 1: FeedForward Propagation We now apply chain-rule to calculate the

Highly correlated features are redundant and

Our Predictive model is trained on the dataset

Data of the months of October 2003, November

Latent state representations which were learned

The test and training set error goes down

Calculation of Ozone Density: The most popular method of evaluating the

Features used in training the model:

Fig 11: Actual v/s Predicted values plot

1. Fig 1 Source: NASA Facts

You might also like