You are on page 1of 12

PROJECT MEMBERS

Osama Bin Shakeel(58486)


Hammad Iqbal(586216)
Contents
INTRODUCTION:3

INSTALLATION:4

DOWNLOAD PYCHARM4

INSATALLTION OF LIBRARIESError! Bookmark not defined.

TIME SERIES CODEError! Bookmark not defined.


INTRODUCTION:
PyCharm is the most popular IDE used for Python scripting language.
PyCharm Edu is based on the IntelliJ Platform, which means that all the
productivity features such as smart code completion, code inspections,
visual debugger, and more not only boost your learning productivity, but
later help you to switch easily and seamlessly to other Jet Brains tools.
When coding in Python, queries are normal for a developer. You can check
the last commit easily in PyCharm as it has the blue sections that can define
the difference between the last commit and the current one. All the
installed packages are displayed with proper visual representation. This
includes list of installed packages and the ability to search and add new
packages.
Time series is a sequence of data points in chronological sequence, most
often gathered in regular intervals. Time series analysis can be applied to
any variable that changes over time and generally speaking, usually data
points that are closer together are more similar than those further apart.

Most often, the components of time series data will include a trend,
seasonality, noise or randomness, a curve, and the level. Before we go on to
defining these terms. most business data will likely contain seasonality
such as retail sales peaking in the fourth quarter.

‘Time’ is the most important factor which ensures success in a business. It’s
difficult to keep up with the pace of time. But, technology has developed
some powerful methods using which we can ‘see things’ ahead of time.
INSTALLATION:
In order to install PyCharm on Ubuntu, we will need to get some
prerequisites out of the way first. For this exercise, make sure that your
service is fully updated.
# apt update && apt upgrade -y

Next, we will need to ensure we have Java 8 installed.


# apt install openjdk-8-jdk -y

Next, we will need to ensure we have Python 2.7 installed.


# apt install python -y

In the next section, we will begin installing Cassandra

DOWNLOAD PYCHARM

We will download the latest version of PyCharm, which as of this writing is


version 2018 3.1 OR by directly download it from the website using
browser.
https://www.jetbrains.com/pycharm/download/#section=linux

Untar the package and move to a better home


# tar pycharm2018-3.1-bin.tar.gz

# mv pycharm2018-3.1/usr/local/pycharm
Next, we will run shell file of PyCharm.

ubuntu@ubuntu:~/Downloads/pycharm-community-2018.3.1/bin$ ./pycharm.sh

After that we have to install/import libraries for Times Series algorithm by


using terminal.
# sudo apt-get install python-pip
# sudo pip install --upgrade pip
# sudo apt-get install python-setuptools python-dev build essential
# sudo easy_install pip
# sudo apt-get install python-matplotlib
# pip install matplotlib
# pip install pandas
# pip install numpy
# pip install seaborn

Times Series Code


Here we’ll learn to handle time series data. Our scope will be restricted to data
exploring in a time series type of data set.
we have used an inbuilt data set called Diabetic patient. The dataset consists
of monthly totals of international airline passengers, 1991 to 1920.
Data Set url:
https://archive.ics.uci.edu/ml/datasets/diabetes?fbclid=IwAR0Ar5XAxSoRO2U4RKEVgLA
eMqrDP6ph9MHUeBklIO4-DKOrTDmqlAJvMVA

import data as data


import pandas as pd

from statsmodels.tsa.stattools import adfuller

import numpy as np

import matplotlib.pylab as plt

import datetime

import seaborn as sns

#%matplotlib inline

sns.set()

from matplotlib.pylab import rcParams

rcParams['figure.figsize'] = 15, 6

#dateparse = lambda dates: pd.datetime.strptime(dates, '%m/%d/%Y h:mm')

data = pd.read_csv('diabetes_dataset.csv', parse_dates=['datetime'],


index_col='datetime' )

print (data.head())

print (data.dtypes)

print (data.index)

ts = data[['code']].head(30)

#------------print(ts['1991'])

print("----------------- Checking Stationarity of Time Series----------")

plt.plot(ts)

plt.show()

def test_stationarity(timeseries):

# Determingrollingstatistics bulb change zaemm


rolmean = pd.DataFrame.rolling(timeseries,window=6,center=False).mean()

rolstd = pd.DataFrame.rolling(timeseries,window=6,center=False).std()

#plot rolling stats

orig = plt.plot(timeseries, color='blue', label='Original')

mean = plt.plot(rolmean, color='red', label='Rolling Mean')

std = plt.plot(rolstd, color='black', label='Rolling Std')

plt.legend(loc='best')

plt.title('Rolling Mean & Standard Deviation')

plt.show(block=True)

#Perform Dickey-Fuller test:

print('Results of Dickey-Fuller Test:')

dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value',


'#Lags Used', 'Number of Observations Used'])

for key, value in dftest[4].items():

dfoutput['Critical Value (%s)' % key] = value

print(dfoutput)

#test_stationarity(ts)

print("------------- Making TimeSeries Stationary ----------------------")

print(" 1) Estimating & Eliminating Trend ")

ts_log = np.log(ts)
plt.plot(ts_log)

print(" 1)a) exponentially weighted moving average ")

expwighted_avg =
pd.DataFrame.ewm(ts_log,halflife=6,min_periods=0,adjust=True,ignore_na=False)
.mean()

plt.plot(ts_log)

plt.plot(expwighted_avg, color='red')

plt.show()

ts_log_ewma_diff = ts_log - expwighted_avg

test_stationarity(ts_log_ewma_diff)

print("------------------------------------------------------")

print(" Difference ")

ts_log_diff = ts_log - ts_log.shift()

plt.plot(ts_log_diff)

ts_log_diff.dropna(inplace=True)

test_stationarity(ts_log_diff)

print(" Decomposition ")

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(ts_log,freq=10)

trend = decomposition.trend

seasonal = decomposition.seasonal
residual = decomposition.resid

plt.subplot(411)

plt.plot(ts_log, label='Original')

plt.legend(loc='best')

plt.subplot(412)

plt.plot(trend, label='Trend')

plt.legend(loc='best')

plt.subplot(413)

plt.plot(seasonal,label='Seasonality')

plt.legend(loc='best')

plt.subplot(414)

plt.plot(residual, label='Residuals')

plt.legend(loc='best')

plt.tight_layout()

print("--------trend and seasonality have been removed and residual send to


dickey fuller test")

ts_log_decompose = residual

ts_log_decompose.dropna(inplace=True)

test_stationarity(ts_log_decompose)

print("----For p and q values Ploting ACF and PACF")

from statsmodels.tsa.stattools import acf, pacf

lag_acf = acf(ts_log_diff, nlags=20)


lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')

plt.subplot(121)

plt.plot(lag_acf)

plt.axhline(y=0,linestyle='--',color='gray')

plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')

plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')

plt.title('Autocorrelation Function')

plt.subplot(122)

plt.plot(lag_pacf)

plt.axhline(y=0,linestyle='--',color='gray')

plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')

plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')

plt.title('Partial Autocorrelation Function')

plt.tight_layout()

plt.show()

import time

from statsmodels.tsa.arima_model import ARIMA

print("----------AR MODEL-------------")

model = ARIMA(ts_log, order=(2, 1, 0))

results_AR = model.fit(disp=-1)

plt.plot(ts_log_diff)

plt.plot(results_AR.fittedvalues, color='red')

plt.show()
print("--------------MA MODEL------------")

model = ARIMA(ts_log, order=(0, 1, 1))

results_MA = model.fit(disp=-1)

plt.plot(ts_log_diff)

plt.plot(results_MA.fittedvalues, color='red')

plt.show()

print("-----------Combined Model----------------")

model = ARIMA(ts_log, order=(2, 1, 2))

results_ARIMA = model.fit(disp=-1)

plt.plot(ts_log_diff)

plt.plot(results_ARIMA.fittedvalues, color='red')

plt.show()

predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)

print(predictions_ARIMA_diff.head())

predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()

print( predictions_ARIMA_diff_cumsum.head())

predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)

predictions_ARIMA_log =
predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)

print(predictions_ARIMA_log.head())
predictions_ARIMA = np.exp(predictions_ARIMA_log)

plt.plot(ts)

plt.plot(predictions_ARIMA)
plt.show()

You might also like