Ibd Manual

PROJECT MEMBERS
Osama Bin Shakeel(58486)

Hammad Iqbal(586216)
Contents
INTRODUCTION:3
INSTALLATION:4
DOWNLOAD PYCHARM4
INSATALLTION OF LIBRARIESError! Bookmark not defined.
TIME SERIES CODEError! Bookmark not defined.

INTRODUCTION:
PyCharm is the most popular IDE used for Python scripting language.
PyCharm Edu is based on the IntelliJ Platform, which means that all the
productivity features such as smart code completion, code inspections,
visual debugger, and more not only boost your learning productivity, but
later help you to switch easily and seamlessly to other Jet Brains tools.
When coding in Python, queries are normal for a developer. You can check
the last commit easily in PyCharm as it has the blue sections that can define
the difference between the last commit and the current one. All the
installed packages are displayed with proper visual representation. This
includes list of installed packages and the ability to search and add new
packages.
Time series is a sequence of data points in chronological sequence, most
often gathered in regular intervals. Time series analysis can be applied to
any variable that changes over time and generally speaking, usually data
points that are closer together are more similar than those further apart.
Most often, the components of time series data will include a trend,
seasonality, noise or randomness, a curve, and the level. Before we go on to
defining these terms. most business data will likely contain seasonality
such as retail sales peaking in the fourth quarter.
‘Time’ is the most important factor which ensures success in a business. It’s
difficult to keep up with the pace of time. But, technology has developed
some powerful methods using which we can ‘see things’ ahead of time.
INSTALLATION:
In order to install PyCharm on Ubuntu, we will need to get some
prerequisites out of the way first. For this exercise, make sure that your
service is fully updated.
# apt update && apt upgrade -y
Next, we will need to ensure we have Java 8 installed.

# apt install openjdk-8-jdk -y
Next, we will need to ensure we have Python 2.7 installed.

# apt install python -y
In the next section, we will begin installing Cassandra
DOWNLOAD PYCHARM
We will download the latest version of PyCharm, which as of this writing is

version 2018 3.1 OR by directly download it from the website using
browser.
https://www.jetbrains.com/pycharm/download/#section=linux
Untar the package and move to a better home

# tar pycharm2018-3.1-bin.tar.gz
# mv pycharm2018-3.1/usr/local/pycharm
Next, we will run shell file of PyCharm.
ubuntu@ubuntu:~/Downloads/pycharm-community-2018.3.1/bin$ ./pycharm.sh
After that we have to install/import libraries for Times Series algorithm by

using terminal.
# sudo apt-get install python-pip
# sudo pip install --upgrade pip
# sudo apt-get install python-setuptools python-dev build essential
# sudo easy_install pip
# sudo apt-get install python-matplotlib
# pip install matplotlib
# pip install pandas
# pip install numpy
# pip install seaborn
Times Series Code

Here we’ll learn to handle time series data. Our scope will be restricted to data
exploring in a time series type of data set.
we have used an inbuilt data set called Diabetic patient. The dataset consists
of monthly totals of international airline passengers, 1991 to 1920.
Data Set url:
https://archive.ics.uci.edu/ml/datasets/diabetes?fbclid=IwAR0Ar5XAxSoRO2U4RKEVgLA
eMqrDP6ph9MHUeBklIO4-DKOrTDmqlAJvMVA
import data as data

import pandas as pd
from statsmodels.tsa.stattools import adfuller
import numpy as np
import matplotlib.pylab as plt
import datetime
import seaborn as sns
#%matplotlib inline
sns.set()
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
#dateparse = lambda dates: pd.datetime.strptime(dates, '%m/%d/%Y h:mm')
data = pd.read_csv('diabetes_dataset.csv', parse_dates=['datetime'],

index_col='datetime' )
print (data.head())
print (data.dtypes)
print (data.index)
ts = data[['code']].head(30)
#------------print(ts['1991'])
print("----------------- Checking Stationarity of Time Series----------")
plt.plot(ts)
plt.show()
def test_stationarity(timeseries):
# Determingrollingstatistics bulb change zaemm

rolmean = pd.DataFrame.rolling(timeseries,window=6,center=False).mean()
rolstd = pd.DataFrame.rolling(timeseries,window=6,center=False).std()
#plot rolling stats
orig = plt.plot(timeseries, color='blue', label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=True)
#Perform Dickey-Fuller test:
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value',

'#Lags Used', 'Number of Observations Used'])
for key, value in dftest[4].items():
dfoutput['Critical Value (%s)' % key] = value
print(dfoutput)
#test_stationarity(ts)
print("------------- Making TimeSeries Stationary ----------------------")
print(" 1) Estimating & Eliminating Trend ")
ts_log = np.log(ts)
plt.plot(ts_log)
print(" 1)a) exponentially weighted moving average ")
expwighted_avg =
pd.DataFrame.ewm(ts_log,halflife=6,min_periods=0,adjust=True,ignore_na=False)
.mean()
plt.plot(ts_log)
plt.plot(expwighted_avg, color='red')
plt.show()
ts_log_ewma_diff = ts_log - expwighted_avg
test_stationarity(ts_log_ewma_diff)
print("------------------------------------------------------")
print(" Difference ")
ts_log_diff = ts_log - ts_log.shift()
plt.plot(ts_log_diff)
ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)
print(" Decomposition ")
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log,freq=10)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.tight_layout()
print("--------trend and seasonality have been removed and residual send to

dickey fuller test")
ts_log_decompose = residual
ts_log_decompose.dropna(inplace=True)
test_stationarity(ts_log_decompose)
print("----For p and q values Ploting ACF and PACF")
from statsmodels.tsa.stattools import acf, pacf
lag_acf = acf(ts_log_diff, nlags=20)

lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')
plt.subplot(121)
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')
plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()
plt.show()
import time
from statsmodels.tsa.arima_model import ARIMA
print("----------AR MODEL-------------")
model = ARIMA(ts_log, order=(2, 1, 0))
results_AR = model.fit(disp=-1)
plt.plot(results_AR.fittedvalues, color='red')
plt.show()
print("--------------MA MODEL------------")
results_MA = model.fit(disp=-1)
plt.plot(results_MA.fittedvalues, color='red')
plt.show()
print("-----------Combined Model----------------")
results_ARIMA = model.fit(disp=-1)
plt.plot(results_ARIMA.fittedvalues, color='red')
plt.show()
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
print(predictions_ARIMA_diff.head())
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
print( predictions_ARIMA_diff_cumsum.head())
predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)
predictions_ARIMA_log =
predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
print(predictions_ARIMA_log.head())
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.plot(ts)
plt.plot(predictions_ARIMA)
plt.show()

Ibd Manual

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Ibd Manual

Uploaded by

Copyright:

PROJECT MEMBERS

Osama Bin Shakeel(58486)

INSATALLTION OF LIBRARIESError! Bookmark not defined.

TIME SERIES CODEError! Bookmark not defined.

Next, we will need to ensure we have Java 8 installed.

Next, we will need to ensure we have Python 2.7 installed.

In the next section, we will begin installing Cassandra

We will download the latest version of PyCharm, which as of this writing is

Untar the package and move to a better home

After that we have to install/import libraries for Times Series algorithm by

Times Series Code

import data as data

from statsmodels.tsa.stattools import adfuller

import matplotlib.pylab as plt

import seaborn as sns

from matplotlib.pylab import rcParams

#dateparse = lambda dates: pd.datetime.strptime(dates, '%m/%d/%Y h:mm')

data = pd.read_csv('diabetes_dataset.csv', parse_dates=['datetime'],

print("----------------- Checking Stationarity of Time Series----------")

# Determingrollingstatistics bulb change zaemm

#plot rolling stats

orig = plt.plot(timeseries, color='blue', label='Original')

mean = plt.plot(rolmean, color='red', label='Rolling Mean')

std = plt.plot(rolstd, color='black', label='Rolling Std')

plt.title('Rolling Mean & Standard Deviation')

#Perform Dickey-Fuller test:

print('Results of Dickey-Fuller Test:')

dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value',

for key, value in dftest[4].items():

dfoutput['Critical Value (%s)' % key] = value

print("------------- Making TimeSeries Stationary ----------------------")

print(" 1) Estimating & Eliminating Trend ")

print(" 1)a) exponentially weighted moving average ")

ts_log_ewma_diff = ts_log - expwighted_avg

print(" Difference ")

ts_log_diff = ts_log - ts_log.shift()

print(" Decomposition ")

from statsmodels.tsa.seasonal import seasonal_decompose

print("--------trend and seasonality have been removed and residual send to

print("----For p and q values Ploting ACF and PACF")

from statsmodels.tsa.stattools import acf, pacf

lag_acf = acf(ts_log_diff, nlags=20)

plt.title('Partial Autocorrelation Function')

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(ts_log, order=(2, 1, 0))

model = ARIMA(ts_log, order=(0, 1, 1))

model = ARIMA(ts_log, order=(2, 1, 2))

predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)

predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)

You might also like