Professional Documents
Culture Documents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
from statsmodels.tsa.arima_model import ARIMA
from datetime import date
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from pmdarima import auto_arima
from datetime import timedelta
import investpy
import scipy.stats as stats
from statistics import stdev
from pandas.plotting import lag_plot
import math
Pandas: pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real-world
data analysis in Python.
Numpy: NumPy is a Python library used for working with arrays. It also has functions for
working in domain of linear algebra, fourier transform, and matrices. NumPy was created in
2005 by Travis Oliphant. ... NumPy stands for Numerical Python.
Matplotlib: Matplotlib is an amazing visualization library in Python for 2D plots of arrays. ...
One of the greatest benefits of visualization is that it allows us visual access to huge amounts
of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter,
histogram etc.
Sklearn: the most useful library for machine learning in Python. The sklearn library
contains a lot of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction.
Statsmodels: Python StatsModels allows users to explore data, perform statistical tests
and estimate statistical models. It is supposed to complement to SciPy's stats module. It is
part of the Python scientific stack that deals with data science, statistics and data analysis.
Datetime: Date and datetime are an object in Python, so when you manipulate them, you are
actually manipulating objects and not string or timestamps. ... datetime – Its a combination of
date and time along with the attributes year, month, day, hour, minute, second, microsecond,
and tzinfo.
Adfuller: The adfuller function returns a tuple of statistics from the ADF test such as the
Test Statistic, P-Value, Number of Lags Used, Number of Observations used for the ADF
regression and a dictionary of Critical Values
ARIMA: ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average.
It is a class of model that captures a suite of different standard temporal structures in time
series data. In this tutorial, you will discover how to develop an ARIMA model for time series
forecasting in Python.
Investpy: investpy is a Python package to retrieve data from Investing.com, which
provides data retrieval from up to 39952 stocks, 82221 funds, 11403 ETFs, 2029 currency
crosses, 7797 indices, 688 bonds, 66 commodities, 250 certificates, and 4697
cryptocurrencies.
Seasonal decompose: The statsmodels library provides an implementation of the naive, or
classical, decomposition method in a function called seasonal_decompose(). It requires that
you specify whether the model is additive or multiplicative. ... The seasonal_decompose()
function returns a result object.
Scipy: SciPy in Python is an open-source library used for solving mathematical,
scientific, engineering, and technical problems. It allows users to manipulate the data and
visualize the data using a wide range of high-level Python commands. SciPy is built on the
Python NumPy extention. SciPy is also pronounced as “Sigh Pi.”
Pmdarima: Pmdarima (originally pyramid-arima , for the anagram of 'py' + 'arima') is a
statistical library designed to fill the void in Python's time series analysis capabilities. This
includes: ... A collection of statistical tests of stationarity and seasonality. Time series utilities,
such as differencing and inverse differencing.
Stdev: Statistics module in Python provides a function known as stdev() , which can be used to
calculate the standard deviation. stdev() function only calculates standard deviation from a
sample of data, rather than an entire population. ... It is used to quantify the measure of
spread, variation of a set of data values.
Lag plot: A lag plot checks whether a data set or time series is random or not. Random
data should not exhibit any identifiable structure in the lag plot. Non-random structure in the lag
plot indicates that the underlying data are not random.
Math: The math module is a standard module in Python and is always available. To use
mathematical functions under this module, you have to import the module using import math . It
gives access to the underlying C library functions.
Load the data
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="*****",
database="sbi_life"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT Dates, Open FROM sbi")
myresult = mycursor.fetchall()
df.drop(['High','Low','Close','Change Pct'], axis = 1)
Here we drop the values which are not use.
print("Mean=",np.mean(df1["Price"]))
np.mean always computes an arithmetic mean, and has some additional options for input and
output = 35.51
print("Median=",np.median(df1["Price"]))
np.median()
The numpy median function helps in finding the middle value of a sorted array.
= 35.572
print(stats.mode(df1["Price"]))
Return an array of the modal (most common) value in the passed array.
= 39.935
print("Standard Deviation=",stdev(list(df1['Price'])))
The standard deviation is the square root of the average of the squared deviations from the mean.
= 3.074
print("Skewness:\n",df1.skew(axis=0))
Skewness
Skewness is a measure of the asymmetry of the probability distribution of a real-valued random
variable about its mean. The skewness value can be positive or negative, or undefined.
Price -0.276881
print("Kurtosis:\n",df1.kurt(axis = 0))
The pandas DataFrame has a computing method kurtosis() which computes the kurtosis for a set
of values across a specific axis (i.e., a row or a column).
Price -1.295515
Q-Q PLOT
The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets
come from populations with a common distribution.
plt.figure(figsize=(15, 10))
stats.probplot(df1["Price"], plot=plt)
plt.title("Normal Q-Q plot")
plt.ylabel("SBI Bond Prices")
plt.show()
histogram plot
A histogram is basically used to represent data provided in a form of some groups.It is
accurate method for the graphical representation of numerical data distribution.It is a type
of bar plot where X-axis represents the bin ranges while Y-axis gives information about
frequency.
plt.figure(figsize=(15, 10))
mu = sum(list(df1['Price']))/len(df1['Price'])
sigma = stdev(list(df1['Price'
n, bins, patches = plt.hist(df1['Price'], bins=30, facecolor='#2ab0ff', edgecolor='#e0e0e0',
linewidth=0.5, alpha=0.7)
n = n.astype('int')
for i in range(len(patches)):
patches[i].set_facecolor(plt.cm.viridis(n[i]/max(n)))
seasonality plot
A seasonal plot is very similar to the time plot, with the exception that the data is plotted
against the individual seasons. Choosing the definition of the season is up to the analyst and
in our particular case, the season is simply the month. We can generate the seasonal plot by
running the following code.
df2 = df1.copy()
plt.rcParams.update({'figure.figsize': (10,10)})
decompose_result_mult = seasonal_decompose(df2["Price"], model="multiplicative")
decompose_result_mult.plot()
autocorrelation for lag_3
Autocorrelation is used to obtain the degree of similarity of a time series with itself, which
provides to obtain periodical components embedded in the data. Autocorrelation of an x(t)
series is expressed analytically.
plt.figure(figsize=(15, 10))
lag_plot(df1['Price'], lag=3)
plt.title('SBI Bond Price - Autocorrelation plot with lag = 3')
plt.show()
scatter plot
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to
display values for typically two variables for a set of data.
df_open = df1['Price']
df_open.plot(style='k,')
plt.title('Scatter plot of closing price')
plt.show()
Train_Test plot
plt.figure(figsize=(10,6))
plt.grid(True)
plt.xlabel('Date')
plt.ylabel('Price')
plt.plot(df1, 'red', label='Train data')
plt.plot(test_data, 'blue', label='Test data')
t.legend()#train, test data
## Model Part ##
Forecast
fc, se, conf = fitted.forecast(len(test_data), alpha=0.05) # 95% confidence
fc_series = pd.Series(fc, index=test_data.index)
lower_series = pd.Series(conf[:, 0], index=test_data.index)
import pickle
Pickle: “Pickling” is the process whereby a Python object hierarchy is converted into a byte
stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file
or bytes-like object) is converted back into an object hierarchy. Pickle in Python is primarily
used in serializing and deserializing a Python object structure. In other words, it's the
process of converting a Python object into a byte stream to store it in a file/database,
maintain program state across sessions, or transport data over the network.
filename = "bond.pkl"
pickle.dump(model2, open(filename,'wb'))
Advantages of investing in SBI(Government) Bonds
The following are the advantages of investing in SBI bonds.
Risk-Free
SBI bonds promise assured returns and stability of funds to investors. They have always
been an example of risk-free security. Thus, investors looking for a risk-free investment,
government bonds are suitable for them.
Returns
The returns from SBI bonds are generally as good as bank deposits. Also, there is a
guarantee of principal along with fixed interest. Unlike bank deposits, these bonds are
available for a longer duration.
One can use Scrip box’s returns calculator to estimate their returns.
Liquidity
One can buy and sell SBI bonds like equity instruments. The liquidity in these bonds is as
adequate as banks and financial institutions.
Portfolio Diversification
Investment in SBI bonds makes a well-diversified portfolio for the investor. It mitigates the
risk of the overall portfolio since SBI bonds are risk-free investments.
Regular Income
As per RBI guidelines, the interest accrued on government bonds shall be disbursed every
six months to bondholders. Therefore, it provides an opportunity for the bondholders to earn
regular income by investing their idle funds.
Yield
SBI bonds or any bonds for that matter, have considerably lesser yield rates as compared
to company stocks and other competitive asset classes, however, this lack of competitive
return on investment is somewhat balanced by the risk to reward ratio nature of the bond
market.