You are on page 1of 46

MAKERERE UNIVERSITY

COLLEGE OF COMPUTING AND INFORMATION SCIENCES

SCHOOL OF COMPUTING AND INFORMATION TECHNOLOGY

SOFTWARE DESIGN DOCUMENT

FINAL YEAR PROJECT

JUNE 2022

BY

BSE 223

GROUP MEMBERSHIP

# NAME REGISTRATION NUMBER STUDENT NUMBER

1 LAKER SHARON 18/U/23399/EVE 1800723399

2 KATAMBA JAMES 18/U/23396/EVE 1800723396

MUJAMBERE REAGAN 18/U/23421/EVE 1800723421


3

4 KISAKYE JOEL NKANJI 18/U/23417/EVE 1800723417


Table 1.
SOFTWARE DESIGN DOCUMENT

1. INTRODUCTION 3
1.1 Purpose 3
1.2 Scope 3
1.3 Overview 4
1.4 Reference Material 5
1.5 Definitions and Acronyms 5

2. SYSTEM OVERVIEW 6

3. SYSTEM ARCHITECTURE 7
3.1 Architectural Design 7
Input Handler 7
Analysis model 8
Visualizer 8
Web Application 8
3.1.1 Data Analysis 8
3.1.1.1 Data Discovery 9
3.1.1.1.1 Data Requirement 9
3.1.1.1.2 Data Collection 10
3.1.1.1.3 Feature (Variable) Selection 10
3.1.1.2 Data Preparation and Transformation 10
3.1.1.2.1 Data Cleaning and Reduction 11
3.1.1.2.2 Data Wrangling 11
3.1.1.2.3 Data Manipulation (Feature Engineering) 12
3.1.1.2.4 Data Normalisation 12
3.1.1.2.5 Data Presentation (Visualisation) 12
3.2 Decomposition Description 13
Predictive model 13
3.3 Design Rationale 17

4. DATA DESIGN 18
4.1 Data Description 18
4.2 Data Dictionary 18

5. COMPONENT DESIGN 21

1
5.1 prediction model 21
5.1.1 SARIMA model 21
5.1.2 Hybrid model 22
5.1.2.1 SOM 23
5.1.2.2 RT 25

6. HUMAN INTERFACE DESIGN 25


6.1 Overview of User Interface 25
6.2 Screen Images 26
6.3 Screen Objects and Actions 26

7. REQUIREMENTS MATRIX 26

8. APPENDICES

List Of Tables
■ Table 1.
■ Table 2.
■ Table 3.
■ Table 4.
■ Table 5.
■ Table 6.
■ Table 7.
■ Table 8.
■ Table 9.
■ Table 10.
■ Table 11.

26

List Of Images
■ Figure 3.1.
■ Figure 4.
■ Figure 3.2.1
■ Figure 3.2.1
■ Figure 3.2.2
■ Figure 3.2.3
■ Figure 4.1.1.2
■ Figure 4.1.1.2
■ Figure 4.1.1.3
■ Figure 4.1.1.4
■ Figure 4.1.1.5
■ Figure 4.1.1.6

2
■ Figure 4.1.1.7
■ Figure 4.1.18
■ Figure 4.1.1.9
■ Figure 4.1.2.1.
■ Figure 5.1.1.1.
■ Figure 5.1.1.2.
■ Figure 6.2.2. Figure 6.2.1
■ Figure 6.2.3.
■ Figure 6.2.4.
■ Figure 6.2.5
■ Figure 6.2.7.
■ Figure 8.1.

3
1. INTRODUCTION

1.1 Purpose

This document is intended to give a detailed technical description of the architecture and system
design of the Smart Water Analysis and Prediction ( SWAP) System described in the software
requirement document. The SWAP shall be designed to provide support services for the smart
water metering vendors. It shall be used to monitor the past, present as well as future
performance of the water vending business.

1.2 Scope

The smart water metering vendors shall be able to access the SWAP system on the web. It
shall use data obtained directly from the database of the vendors as well as weather data from
WIMEA-ICT.

Once all these data are in place, they are preprocessed, cleaned and prepared for both analysis
and prediction. Data wrangling, Data Manipulation (Feature Engineering), Data Normalisation
and Data Presentation (Visualisation) shall be performed to achieve the analysis of the data.

Different models that compliment one another in order shall be combined and model Evaluation
will be carried out using hyperparameters to optimise the model inorder to maximise accuracy to
achieve an accurate prediction results.

The ability to predict the future with a degree of certainty can enable one to make better
decisions when it comes to the efficient allocation of resources. The SWAP system is aimed at
providing this level of certainty to water vendors through water demand analysis and prediction
by putting into account the different factors that affect water demand. A number of similar
projects already exist around the globe but only a few consider weather as a factor affecting
water demand and this often leads to less accurate predictions. We aim at maximising accuracy
by putting into account the weather factor when developing our prediction mode

4
1.3 Overview

This document presents an overview of the system, the design considerations leading to the
system architecture, describes the system architecture itself, and finally details the system
design.

This document has been organised into chapters as described beloW

Chapter 1: describes the product scope, the purpose of this document, the reference material
and abbreviations used with their meanings.

Chapter 2: describes the system overview i.e. the general description of the functionality,
context and design of the SWAP system.

Chapter 3: describes the system architecture, which comprises the system architecture design,
the decomposition of the system and justification for the choice of the design.

Chapter 4: describes the data design of the system, which shows how the information domain of
the system is transformed into data structures.

Chapter 5: describes the component design of the system. It gives a functional description of
each component in detail.

Chapter 6: describes the human interface design of the system, the way users interact with the
system and provides screen images for each of the components.

Chapter 7: contains the requirement matrix for tracing the requirements defined in the software
requirements specification.

1.4 Reference Material


Bação, F., Lobo, V., & Painho, M. (2005). The self-organizing map, the Geo-SOM, and relevant
variants for geosciences. Computers and Geosciences, 31(2), 155–163.
https://doi.org/10.1016/J.CAGEO.2004.06.013

Liou, C. Y., Kuo, Y. T., & Huang, J. C. (2008). Conformal self-organizing map on curved
seamless surface. Neurocomputing, 71(16–18), 3140–3149.
https://doi.org/10.1016/J.NEUCOM.2008.04.031

5
Bação, F., Lobo, V., & Painho, M. (2005). The self-organizing map, the Geo-SOM, and relevant
variants for geosciences. Computers & Geosciences, 31(2), 155–163.
https://doi.org/10.1016/J.CAGEO.2004.06.013

Hebb, D. (1949). Organization of behavior. Wiley.

Amari, S.-I. (1980). Topographic organization of nerve fields. Bull. Math. Biol., 42, 339–364.

Bata, tamad, Carriveau, R., & S-K Ting, D. (n.d.). Short-term water demand forecasting using
hybrid supervised and unsupervised machine learning model. https://doi.org/10.1186/s40713-
020-00020-y

1.5 Definitions and Acronyms

SWAP: Smart Water Analysis and Prediction

CSV: Comma-Separated Values

RT: Regression tree

SOM: Self-organising map

SARIMA: Seasonal autoregressive integrated moving average

ACF: Autocorrelation Function

PACF: Partial Autocorrelation Function

System admins: Water vendors using the system

Super admins: The providers of the smart metering system

6
2. SYSTEM OVERVIEW

The SWAP system shall be designed to render support service for smart water metering
vendors both on small and large scale who are referred to as system administrators in this
document and their service providers also referred to as super admins in this document. The
system shall comprise five components that is to say the input handler, analysis model,
prediction models, visualizer and a web application and It shall be used to monitor the past,
present as well as future performance of the water vending business.

The two major functions of the system is to perform analysis of past performance and forecast
the future consumption and sales. The owner of the business shall be able to view analysed
performance as far as amount of past water consumption and sales is concerned in the form of
a visualised graphical format for darily, weekly, monthly and annually. He shall also be able to
view the predicted amount of future water consumption as well as future sales for darily, weekly,
monthly and annual basis.

The data that we will be using will be fetched directly from the database of the vendors already
using smart water metering for their business and incorporated with data from the weather
station to enhance accuracy.

These raw historical data will be preprocessed, cleaned and prepared to cater for the analysis
and the prediction.

Data wrangling, Data Manipulation (Feature Engineering), Data Normalisation and Data
Presentation (Visualisation) shall be performed to achieve the analysis of the data.

Different models that compliment one another in order shall be combined and model Evaluation
will be carried out using hyperparameters to optimise the model inorder to maximise accuracy to
achieve an accurate prediction results.

The system shall be accessible by the users on the web.

7
3. SYSTEM ARCHITECTURE

3.1 Architectural Design


The system architecture will comprise of five major components namely input handler, analysis
model, prediction models, visualizer and a web application. The system will use the Client-
Server Architecture pattern (see figure 3.1) .

Figure 3.1. System Architecture

Input Handler

A back-end input handler written in python program that will converts data sets from the
database on the server which has data from smart water metre and geographical weather data
where metres are located into CSV that can be used by the analysis model to analyse it and get
accurate insight for the predictive model .`

8
Analysis model

The analysis model is the function of the system that will be able to give analysis of the
weighted data from the input handler of past business performance as far as past water
consumption amount as well as sales is concerned in a format that can be easily manipulated
by the prediction model. For this to be accomplished, we will consider weighted data generated
by the input handle then use machine learning process and standard validated techniques to
ensure that the information is rich and reliable so that data-driven decision making can be made
by the predictive model .

Prediction model
Predictive model that uses data from the analysis model to predict future water consumption
levels and this can be achieved by using the inbuilt that will be trained and tested using
SARIMA model and the hybrid model during the early phase of the system implementation. The
results are given to the user(Admin) through HTML pages that can be accessed by the browser
after login on the internet and then saved through the input handler which converts them to
suitable format for future reference in the database.

Visualizer

The visualizer is a software component that will programmed to carry out a number of functions
such asl fetching data from the database, converting it to CSV file format and using the columns
specified to plot graphs, charts and tables that the user will be able to view through the browser
presented as HTML files.

Web Application

The web application will be used by the user(Admin) to login into the system, the credentials to
use will be provided when the user buys the metre . This will enable him to access the
prediction model, visualise and do other functions that he has access to as specified.

3.2 Decomposition Description

9
3.2.1 Predictive model
This illustrates in fig3.2 the breakdown and how our two predictive algorithms are going to work
together to produce an accurate prediction outcome. It shows the steps that will be employed in
developing, calibrating and validating the models.
processed data is fed into the SARIMA model, whereas it is divided into two groups, target data
and input data for the hybrid model. For the hybrid model, the target data is fed into the SOM
model and the input data is sent directly to the RT model. In the SOM model, the target data is
clustered and the output cluster number accompanied with the target data is added to the input
data in the RT model. At this point, the RT model performs the prediction at time steps ahead.
After predicting the target, the performance of the model is assessed (i.e. compared to the held
back water demand data). If the performance is satisfactory, the model is implemented to
forecast t time steps ahead.
However, usually the desired performance cannot be obtained from the first trial. In this case,
more neurons can be added to the SOM model and/or more leaves and folds can be added to
the RT model

Figure 3.2.1 the prediction model

10
3.2.2 BPM for the predictive system
The Business Process Model Fig3.3 illustrates the flow of the water consumption prediction
model that we intend to employ in our project. The first three stages of this process are fully
described in the previous sections. The important aspect to note about this model is that first, if
the model finds that the selected date for prediction is a weekend, the model only chooses the
weekends for use in the training phase. That means the model will ignore all the weekdays for
the training phase and vice versa for the weekend. These selected dataset will be used in the
training and testing phases which will enhance accuracy in our prediction.

The last important aspect is about the last stage to achieve the most accurate and efficient
model, we will use hyperparameters in the evaluation step for each algorithm. This will help us
to improve our model evaluation results

11
Figure 3.2.1 the BPM diagram

12
3.2.3 Use Case diagram for SWAP system

Figure 3.2.2. Use case diagram

The use case diagram above shows the user interaction with the system, with our two main
users, the first being the Adim ( the owner of the business ), he will be able to login to the
system, enter duration for which he wants the prediction and afterwards he will be able to view
the predicted value. He will also be able to view the different reports in a visualised form and
change water rates or charges.
The super admin on the other end can create new admin, login, view reports and historical data.

13
3.2.4 Class diagram for SWAP system

Figure 3.2.3. Class diagram

3.3 Design Rationale


The architecture of the whole system is a client - server architecture because it is cross platform
.The database on the server provides information from smart metres and also saves results
from the prediction model which can be retrieved as historical results .Data science for
prediction,analysis and visualisation was used because it conforms to the system requirements.
We use a predictive model which is a combination of SARIMA and hybrid because they
complement each other. This increases accuracy with four predictive approaches, the daily

14
prediction, weekly prediction, the monthly and the custom prediction. The predictive results can
be viewed by the user and stored in the database after analysing the past water consumption
data in the analysis model.

15
4. DATA DESIGN

4.1 Data Description


Our data will be stored on a local sql server where the SWAP system will be fetching and
storing it in sql tables. We decided on which database entities to use by considering the kind of
information we will need to make predictions and create good analytic visualisations. The table
below (Table 2) describes each of the entities that we will store in our database.
contains the entity relation diagram for the SWAP system describing how the entities in the
database relate to one another.

Entity Description

SuperAdmin The super admin details will store information about the super
administrator

SystemAdmin This table will store information about the vendor who is also
known as the system administrator

WaterMeter This table will store information about the water metres

WeatherForecast This table will store weather forecast information

TotalDailySales This table will store details about the daily total sales for each
water

DailyTransactions This table will store information about the daily transactions of
each water metre

WaterConsumption This table will store the daily water consumption details for each
water metre

PredictionResults This table will store all information about the predictions made for
every water metre
Table 2. Entity decscription

4.1.1 Binary relation between entities

The super administrator is the one responsible for registering system administrators

1_1 creates 0_*


SuperAdmin SystemAmin

Figure 4.1.1.1

16
A super administrator can also register new super administrators

Can create
SuperAdmin SuperAdmin
1_1 0_*
Figure 4.1.1.2

A system administrator can own one or more water metres


owns
SystemAdmin WaterMeter
1_1 1_*
Figure 4.1.1.3

Prediction results must belong to a certain water metre


Belong to
PredictionResults WaterMeter
0_* 1_1
Figure 4.1.1.4

A system administrator can view prediction results


views
SystemAdmin PredictionResults
1_1 0_*
Figure 4.1.1.5

Prediction results come from weather information


use
PredictionResults WeatherInfo
1_1 1_*
Figure 4.1.1.6

A system administrator can view the total daily sales of their metres
views
SystemAdmin TotalDailySales
1_1 1_*
Figure 4.1.1.7

The total daily sales come from daily transaction details

17
Come from
TotalDailySales 1_1 0_* DailyTransactionDeta
ils
Figure 4.1.1.8

Each water metre has its own daily transaction details


has
WaterMeter 1_1 0_* DailyTransactionDetai
ls
Figure 4.1.1.9

18
4.1.2 Entity Relation Diagram for the SWAP system

Figure 4.1.2.1 entity relation diagram

4.2 Data Dictionary


The data used by the system shall store in a relational MYSQL based database. Below are the
tables and the data values to be stored.

19
PredictionResults
Name Datatype Length Description

resultId INT NULL Unique identifier

meterId {FK} INT NULL Metre identifier

duration VARCHAR 30 duration of the prediction

Date DATE 30 Start date of the week

forecastId {FK} INT NULL Id of the forecast information

prediction DOUBLE 30 Prediction results


Table 3.

SuperAdmin
Name Datatype Length Description

superAdminId INT NULL Super admin id

phoneNumber INT 30 Super admin’s


contact

password VARCHAR 30 Super admin’s


password
Table 4.

SystemAdministrator
Name Datatype Length Description

systemAdminId INT NULL Vendor’s id

superAdminId {FK} INT NULL Super administrator’s


Id

phoneNumber INT 30 Vendor’s contact

password VARCHAR 30 Vendor’s password


Table 5.

TotalDailySales

20
Name Datatype Length Description

salesId INT NULL Unique id

date DATE 30 Date for which the sales


were made

meterId {FK} INT NULL Metre id

totalTransactions INT 30 Number of transactions


made on that day

amount INT 30 Total sales made that day


Table 6.

DailyTransactions
Name Datatype Length Description

transactionId INT NULL Transaction identifier

meterId {FK}

date DATE 30 Date of transaction

time TIME 30 Time of transaction

amount DOUBLE 30 Amount paid

balance DOUBLE 30 Balance remaining

waterChargeRate INT 30 Amount charged per


unit of water
Table 7.

WaterConsumption
Name Datatype Length Description

id INT NULL Unique identifier

meterId {FK} INT NULL Specific metre id

date DATE 30 Date

amountConsumed DOUBLE 30 Amount of water consumed


on that day

21
Table 8.

WaterMeters
Name Datatype Length Description

meterId INT NULL Unique metre id

systemAdminId {FK} INT NULL Administrator id

waterChargeRate DOUBLE 30 Rate charged per unit


of water

region VARCHAR 30 Location of the metre

gpsCoordinates VARCHAR 30 Gps coordinates of


the metre

districtName VARCHAR 30 District where the


metre is located
Table 9.

WeatherForecast
Name Datatype Length Description

forecastId INT NULL Unique Id of forecast


information

region VARCHAR 30 Location for the


forecast

date DATE 30 Date of the forecast

rain DOUBLE 30 Rainfall forecast

temp DOUBLE 30 Temp forecast


Table 10.

22
5. COMPONENT DESIGN

5.1 prediction model

5.1.1 SARIMA model


SARIMA model, denoted by ARIMA (p, d, q) x (P, D, Q) s, is a simple statistical model
that we will use to analyse and forecast time series data (Shumway and Stoffer 2000). The
(p, d, q) non-seasonal order of the model is the number of Autoregressive (AR) param-
eters, differences, and Moving Average (MA) parameters. The (P, D, Q) s order of the
seasonal order of the model is the AR parameters, differences, MA parameters, and
periodicity. SARIMA model is formulated as (Shumway and Stoffer 2000):

Where,
Φ P (B S ): is the seasonal AR parameter of order P
φ(B): is the ordinary non-seasonal AR parameter
∇ DS : is the seasonal difference component ( ∇ S = {1 - B } )
∇ : is the ordinary non-seasonal difference component ( ∇ d = {1 – B} d )
X t : is the measured time series denoted by time t
δ: is the intercept
Θ Q (B S ): is the seasonal MA parameter of order Q
θ(B): is the ordinary non-seasonal MA parameter
W t : is the usual Gaussian noise process

Figure 5.1.1.1. Sarima algorithm

23
The pseudocode above shows the algorithm used to determine these parameters and develop
SARIMA models. SARIMA seasonal and non-seasonal parameters will be estimated iteratively
through plotting the Autocorrelation Function (ACF) and the Partial Autocorrelation Function
(PACF). SARIMA is a simple traditional model that can be trained and fitted on a small dataset.

5.1.2 Hybrid model


The hybrid model will consist of two models, SOM model and RT model. The
practice for the proposed hybrid model will just be to simply feed the output of the SOM
clustering model, accompanied by other desired correlated inputs, to the RT forecasting model.

5.1.2.1 SOM

This model is also known as Kohonen Neural Networks (Kohonen 1982), is an unsupervised
learning technique that reduces data dimensionality. it uses competitive learning to
cluster input data into groups while preserving the topology and the distribution of the
input data. Simply stated, an n-dimensional grid of neurons compete to win data points
according to how close these points are in the input pattern. The patterns that are close
in the input space will be mapped to units that are close in the output space (i.e. grid)
(Bação et al. 2005).

24
Figure 5.1.1.2
a
The SOM model algorithm above shows the pseudocode algorithm that will be used to develop
SARIMA forecasting models. Processed target data is fed to the model where the model is
trained, tested, and validated. After reaching a satisfying performance, the time ahead input
data will be read to predict the response (i.e. the cluster number).
b
RT model algorithm. This figure shows the algorithm used to develop RT forecasting models.
Processed target data will be fed to the model where the model is trained, tested, and validated.

25
After reaching a satisfying performance, the time ahead input data will be read to predict the
response (i.e. the water demand

5.1.2.2 RT

RT is a supervised learning technique that is used for prediction. It is the numeric


outcome model of the general classification and regression tree (CART) introduced by
(Breiman et al. 1984). The model is constructed with an assembly of rules based on variables
extracted from the dataset (i.e. predictors). These rules are represented by values that are
selected to form the best possible splits to differentiate instances (i.e.observations). Once a rule,
also called decision, is selected, a split is applied at a specific node. This process continues to
be applied to each node in the tree through a recursive procedure. RT models are obtained by
repeatedly dividing the data space and fitting a simple prediction model within each split. As a
result, the data division can be represented graphically as a decision tree (Loh 2011). This
splitting process continues until a predefined limit is reached. This limit could be where no
further information gain can be achieved. Alternatively, splitting can be left to continue where the
tree is pruned at the end of the process. Pruning is a technique that establish stopping rules to
prevent the growth of tree sections that do not seem to improve the accuracy of the predicting
Model.
RT model development begins with feeding the input data to the tree root, then the
data is filtered and sent to a branch and then to another branch until it reaches the leaf.
The leaf is where the final decision is made, called the Response. For our project, four
RT models will be developed to forecast daily, weekly, monthly and annually ahead. The first RT
model will be a standalone model. This model will not be fed any of the SOM output (i.e. no K
inputs) as a model input. The rest of the models (HYB-N2, HYB-N3 and HYB-N4,) will be
hybridised models; all predictors will be fed to the model every time the model predicts the
future outflow demand and sales.

5.2. Analysis model


This shows the breakdown of all processes that has to be accomplished to achieve the
analysed result

26
5.2.1 Data Analysis
Our first main function is that the system will be able to give analysis of the past business
performance as far as past water consumption amount as well as sales is concerned in a
graphical format. For this to be accomplished, we will consider data structure as our first step
since it's the main foundation for every machine learning process. The chosen analytic
approach determines the Data Requirement, Data Collection, and Data Presentation
(Visualisation), this will however be broken down into two major modules i.e. data discovery and
data preparation and transformation as illustrated in the figure 1 below.
It should be noted that this process will only be performed at the start of the implementation and
an automated analytic algorithm will be used for subsequent data analysis.

5.2.1.1 Data Discovery

5.2.1.1.1 Data Requirement

27
We will access historical data directly from the database of one of the main smart water
metering vendors and convert it into a CSV file. We investigated what type of data we needed
and found out that their data could work for us. In this project, we require information about the
rate of water consumption daily based on date and time. Therefore, the required data are the
same as the properties of the recorded water data in the databases such as the volume of
consumption and amount paid that is considered a feature.

5.2.1.1.2 Data Collection

After the data requirement phase, our next step will be fetching this data and converting it into a
CSV file, where we shall measure and analyse accurate insight for our project using standard
validated techniques to ensure that the information is rich and reliable so that data-driven
decision decision can be made`
the modelling step.

5.2.1.1.3 Feature (Variable) Selection

This step kicks off immediately after data collection. Our focus here is on the attributes related
to the expressed problem solution in the study. This step will be carried out in both the analysis
and the prediction. Only appropriate data will be selected to aid higher accuracy in the
prediction. Thus, we will use the feature selection technique to prevent losing time and reduce
the quality of model performance. In our study, we will choose the date and time, the amount of
water consumption (Value), amount paid, weather condition of the location at that time and date
as the major significant features or variables.

5.2.1.2 Data Preparation and Transformation

When the data discovery phase is completed, we shall move to the transformation of the raw
data obtained from the discovery phase into data that can be used by the algorithms. This step
is a wandering trip for understanding the context of data until future predictions.

28
The most important, complicated and time-consuming task in machine learning projects is to
consider the data structure because collected data includes an unexpected range
of values, missing values, incorrect combination of data, among others. Data preparation is
required to transform the raw data into accurate and acceptable data for our use. Which makes
data pre-processing necessary for every machine learning project with data cleaning, data
reduction, data editing, and data wrangling steps.

5.2.1.2.1 Data Cleaning and Reduction

Here we shall be fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or


incomplete data within our dataset. A number of small activities will be carried out in this phase,
some of which will include removing one row or column of data or replacing new values. Filling
in missing values by using a default value, detecting, and removing duplicate data records
covering private and sensitive information or data, and matching the data based on the
requirements mentioned in the study, removing missing values or null, empty cells. We shall
also identify anomalies that contain errors and remove them. In our study,

5.2.1.2.2 Data Wrangling

We shall carry out Data wrangling (Data Munging) which is the process of manipulating data
illustrated in the figure 3 below. This involves changing the nature of the data example mapping
data, changing the data distribution, and changing the format of raw data to another form that is
more useful and worthwhile to be utilised in the analysis phase. Data Munging is an operation of
data normalisation, data aggregation, format updating, and data visualisation. We will consider
various types of data such as categorical or Continuous Variables in this system, some of the
independent variable such as the weekdays, weekends and so front will be categorical and
required the one-hot encoding, while the amount of water consumption (the dependent variable)
will not require the one-hot Encoding.

29
5.2.1.2.3 Data Manipulation (Feature Engineering)

We shall carry out data manipulation to ensure that we have a rich and valuable resource of
data. This method shall involve a number of small tasks such as dividing one column into
several columns, removing some columns, data aggregating, adding new columns as new
features, and so forth. Data enrichment is another type of data formatting that includes joining
data, connecting data, and adding data to limit data with basic.

5.2.1.2.4 Data Normalisation

For data quality improvement, normalisation is an efficient technique. When there is a


considerable difference between the values of features, the feature with a larger value
intrinsically affects the prediction result. So, we will apply normalisation as one of the data
preparation steps in machine learning to put all variables on the same scale. This scale
adjustment will be made without losing data or distorting the amplitude of each value therefore
we will use normalisation technique to put all our variables in the same range, optimise data
integrity, and decline data redundancy.

5.2.1.2.5 Data Presentation (Visualisation)

Data visualisation or presentation is a simple display of trends and data patterns in charts,

30
graphs, or tables. A good presentation of data leads us to correct interpretation of the rela-
tionship between data. This allows us to have a correct analysis of the data to predict the
future. Therefore it is an important step in our project because the better the visualisation of
the data, the more data we can interpret and assess. Instead of looking through many rows
of data, we can look at the summary of data in a chart or a graph. Visualisation will helps us
understand the trend of data and transfer it simply and more understandable to others by a
simple and clear picture of what is happening in a huge dataset process. We shall consider
factors like time frame since it has a vital role in the trend of the dataset process and its
passage affects the dataset pattern. By considering the effect of the time factor along other
factors like weather, days of the week and so forth, we will be able to determine how much
water was used at defined time intervals. Various types of charts will be exploited to bring this
clear visualisation results such as histogram, bar graph, line chart, scatter plot, pie chart, heath
maps among others.

6. HUMAN INTERFACE DESIGN

6.1 Overview of User Interface

31
The user will interact with the system in-order to perform three main functions: enter
duration for which prediction needs to be revealed, see analysis in graphical forms of
the past performance of the business and see predicted future performance of the
system as far as capital is concerned.

Enter Duration for the Predictions


This feature will enable a user to select and filter predictions according to the range of
time he or she would want to view. The system will filter results by the period of the
user’s preference. This will reduce the time the user requires to get to the specific
results.
The results will be filtered in terms of:
· Daily – Day to day terms
· Weekly – Enable viewing by week.
· Monthly - Enable viewing by month
· Annually - Enable viewing by Year

View Analysis in Graphical Forms


This feature will enable the user to view analysis of the performance of the system
business in the form of graphs. The selected period will be displayed and visualized in
the form of plots. The plots will in four categories ie;

1. Line graphs
This will enable the user to visualise trends in the performance of the business and
water consumption trends in a selected period of time and the area.

2. Bar graphs
This will enable the user to visualise the difference in water consumption levels of
selected areas or selected periods of time.

3. Pie charts

32
This will enable the user to visualise the areas with the greatest percentage of water
consumption and this will reflect an effect on the business growth.

4. Scatter plots
This will enable the user to visualise the general trends in which water is consumed
according to the selected period of time and this will reflect trends in the business
growth.

Viewing the predictions


This feature will enable the user to view the future estimations made by the predictive
model designed into the system. The feature will also enable the user to view the past
made predictions to allow the business owner or user to adjust in required payments
and bills (For example power and water bills if any or even maintenance).
The feature will display results according to the selected duration for convenience and
faster action.
The results are displayed in a tabular format. The table will consist of the following
columns

Location
This will contain the place name of the area being predicted for by the model
Amount of Water
This will represent the amount of water in litres
Time(Period)
This is either a month, day or year according to the user choice of view

Login
The user will also be provided with this feature in order to be authenticated into the
system.
The required fields are phone number, email or username and password.

33
6.2 Screen Images

This image shows the the filter functionality where the user can view predictions by the time
selected

34
Figure 6.2.1
This screen shows the dashboard of the system

35
Figure 6.2.2.

The predictions screen for viewing predictions

36
Figure 6.2.3

Login Screen for authentication

37
Figure 6.2.4

38
Graphical Visualis

Figure 6.2.5

The current data screen for viewing the current data

39
Figure 6.2.6

Past Data Tabular View

40
Figure 6.2.7

41
6.3 Screen Objects and Actions

Login Screen
User Name
User names can range from 6 to 20 letters (numbers), as the industry standard. No special
characters, space.User can as well use their email and phone contact they received during
registration.

Password
Passwords can range from 6 to 20 letters (numbers), as the industry standard. No special
characters, space.

Submit
If the users enter the right username with the matching password, it will immediately take them
to the dashboard.
Cancel
If the user wishes to exit the program, hit the “Cancel” button

Dashboard
Here the user is presented with a summary of the current status of the system and the user can
click to view details.

Results filter
Dropdown
The user can select any option of time using the dropdown.

Save
This button allows the user to save the currently selected filter.

Cancel
This button closes the dialog in case of a second thought of the user.
After saving, the user is presented with the selected period data predictions.

Visualisation Screen
View Graphs
Here the user can only view the graphical data as selected by time or period on the results filter
screen.

Current Data Screen


View Tables
Here the user can only view the tabular data as selected by time or period on the results filter
screen.

42
Past Data Screen
View Tables
Here the user can only view the tabular data as selected by time or period on the results filter
screen.

43
7. REQUIREMENTS MATRIX

Requirement Web Prediction Input handler Visualizer Analysis


ID application model model

3.7.1.01 X

3.7.2.02 X

3.7.2.03 X

3.7.3.04 X

3.7.5.05 X

3.7.6.06 X

3.7.6.07 X

3.7.7.08 X

3.7.8.09 X
Table 11. Requirements matrix

8. APPENDICES
How the SOM model works

44
Figure 8.1 SOM model

This Fig 8.1 shows a 2-dimensional (2D) SOM structure with the number of neurons (N) equal 3.
This 2D network structure presents a brief introduction to the mechanism of the SOM. The
inputs are fed to the map where initial weights are assigned. Then, by calculating the Euclidean
Distance, the indices are arranged in clusters. The number of resulting clusters depends on the
number of neurons selected when the map is structured. Each index is won by one neuron.

45

You might also like