Professional Documents
Culture Documents
JUNE 2022
BY
BSE 223
GROUP MEMBERSHIP
1. INTRODUCTION 3
1.1 Purpose 3
1.2 Scope 3
1.3 Overview 4
1.4 Reference Material 5
1.5 Definitions and Acronyms 5
2. SYSTEM OVERVIEW 6
3. SYSTEM ARCHITECTURE 7
3.1 Architectural Design 7
Input Handler 7
Analysis model 8
Visualizer 8
Web Application 8
3.1.1 Data Analysis 8
3.1.1.1 Data Discovery 9
3.1.1.1.1 Data Requirement 9
3.1.1.1.2 Data Collection 10
3.1.1.1.3 Feature (Variable) Selection 10
3.1.1.2 Data Preparation and Transformation 10
3.1.1.2.1 Data Cleaning and Reduction 11
3.1.1.2.2 Data Wrangling 11
3.1.1.2.3 Data Manipulation (Feature Engineering) 12
3.1.1.2.4 Data Normalisation 12
3.1.1.2.5 Data Presentation (Visualisation) 12
3.2 Decomposition Description 13
Predictive model 13
3.3 Design Rationale 17
4. DATA DESIGN 18
4.1 Data Description 18
4.2 Data Dictionary 18
5. COMPONENT DESIGN 21
1
5.1 prediction model 21
5.1.1 SARIMA model 21
5.1.2 Hybrid model 22
5.1.2.1 SOM 23
5.1.2.2 RT 25
7. REQUIREMENTS MATRIX 26
8. APPENDICES
List Of Tables
■ Table 1.
■ Table 2.
■ Table 3.
■ Table 4.
■ Table 5.
■ Table 6.
■ Table 7.
■ Table 8.
■ Table 9.
■ Table 10.
■ Table 11.
26
List Of Images
■ Figure 3.1.
■ Figure 4.
■ Figure 3.2.1
■ Figure 3.2.1
■ Figure 3.2.2
■ Figure 3.2.3
■ Figure 4.1.1.2
■ Figure 4.1.1.2
■ Figure 4.1.1.3
■ Figure 4.1.1.4
■ Figure 4.1.1.5
■ Figure 4.1.1.6
2
■ Figure 4.1.1.7
■ Figure 4.1.18
■ Figure 4.1.1.9
■ Figure 4.1.2.1.
■ Figure 5.1.1.1.
■ Figure 5.1.1.2.
■ Figure 6.2.2. Figure 6.2.1
■ Figure 6.2.3.
■ Figure 6.2.4.
■ Figure 6.2.5
■ Figure 6.2.7.
■ Figure 8.1.
3
1. INTRODUCTION
1.1 Purpose
This document is intended to give a detailed technical description of the architecture and system
design of the Smart Water Analysis and Prediction ( SWAP) System described in the software
requirement document. The SWAP shall be designed to provide support services for the smart
water metering vendors. It shall be used to monitor the past, present as well as future
performance of the water vending business.
1.2 Scope
The smart water metering vendors shall be able to access the SWAP system on the web. It
shall use data obtained directly from the database of the vendors as well as weather data from
WIMEA-ICT.
Once all these data are in place, they are preprocessed, cleaned and prepared for both analysis
and prediction. Data wrangling, Data Manipulation (Feature Engineering), Data Normalisation
and Data Presentation (Visualisation) shall be performed to achieve the analysis of the data.
Different models that compliment one another in order shall be combined and model Evaluation
will be carried out using hyperparameters to optimise the model inorder to maximise accuracy to
achieve an accurate prediction results.
The ability to predict the future with a degree of certainty can enable one to make better
decisions when it comes to the efficient allocation of resources. The SWAP system is aimed at
providing this level of certainty to water vendors through water demand analysis and prediction
by putting into account the different factors that affect water demand. A number of similar
projects already exist around the globe but only a few consider weather as a factor affecting
water demand and this often leads to less accurate predictions. We aim at maximising accuracy
by putting into account the weather factor when developing our prediction mode
4
1.3 Overview
This document presents an overview of the system, the design considerations leading to the
system architecture, describes the system architecture itself, and finally details the system
design.
Chapter 1: describes the product scope, the purpose of this document, the reference material
and abbreviations used with their meanings.
Chapter 2: describes the system overview i.e. the general description of the functionality,
context and design of the SWAP system.
Chapter 3: describes the system architecture, which comprises the system architecture design,
the decomposition of the system and justification for the choice of the design.
Chapter 4: describes the data design of the system, which shows how the information domain of
the system is transformed into data structures.
Chapter 5: describes the component design of the system. It gives a functional description of
each component in detail.
Chapter 6: describes the human interface design of the system, the way users interact with the
system and provides screen images for each of the components.
Chapter 7: contains the requirement matrix for tracing the requirements defined in the software
requirements specification.
Liou, C. Y., Kuo, Y. T., & Huang, J. C. (2008). Conformal self-organizing map on curved
seamless surface. Neurocomputing, 71(16–18), 3140–3149.
https://doi.org/10.1016/J.NEUCOM.2008.04.031
5
Bação, F., Lobo, V., & Painho, M. (2005). The self-organizing map, the Geo-SOM, and relevant
variants for geosciences. Computers & Geosciences, 31(2), 155–163.
https://doi.org/10.1016/J.CAGEO.2004.06.013
Amari, S.-I. (1980). Topographic organization of nerve fields. Bull. Math. Biol., 42, 339–364.
Bata, tamad, Carriveau, R., & S-K Ting, D. (n.d.). Short-term water demand forecasting using
hybrid supervised and unsupervised machine learning model. https://doi.org/10.1186/s40713-
020-00020-y
6
2. SYSTEM OVERVIEW
The SWAP system shall be designed to render support service for smart water metering
vendors both on small and large scale who are referred to as system administrators in this
document and their service providers also referred to as super admins in this document. The
system shall comprise five components that is to say the input handler, analysis model,
prediction models, visualizer and a web application and It shall be used to monitor the past,
present as well as future performance of the water vending business.
The two major functions of the system is to perform analysis of past performance and forecast
the future consumption and sales. The owner of the business shall be able to view analysed
performance as far as amount of past water consumption and sales is concerned in the form of
a visualised graphical format for darily, weekly, monthly and annually. He shall also be able to
view the predicted amount of future water consumption as well as future sales for darily, weekly,
monthly and annual basis.
The data that we will be using will be fetched directly from the database of the vendors already
using smart water metering for their business and incorporated with data from the weather
station to enhance accuracy.
These raw historical data will be preprocessed, cleaned and prepared to cater for the analysis
and the prediction.
Data wrangling, Data Manipulation (Feature Engineering), Data Normalisation and Data
Presentation (Visualisation) shall be performed to achieve the analysis of the data.
Different models that compliment one another in order shall be combined and model Evaluation
will be carried out using hyperparameters to optimise the model inorder to maximise accuracy to
achieve an accurate prediction results.
7
3. SYSTEM ARCHITECTURE
Input Handler
A back-end input handler written in python program that will converts data sets from the
database on the server which has data from smart water metre and geographical weather data
where metres are located into CSV that can be used by the analysis model to analyse it and get
accurate insight for the predictive model .`
8
Analysis model
The analysis model is the function of the system that will be able to give analysis of the
weighted data from the input handler of past business performance as far as past water
consumption amount as well as sales is concerned in a format that can be easily manipulated
by the prediction model. For this to be accomplished, we will consider weighted data generated
by the input handle then use machine learning process and standard validated techniques to
ensure that the information is rich and reliable so that data-driven decision making can be made
by the predictive model .
Prediction model
Predictive model that uses data from the analysis model to predict future water consumption
levels and this can be achieved by using the inbuilt that will be trained and tested using
SARIMA model and the hybrid model during the early phase of the system implementation. The
results are given to the user(Admin) through HTML pages that can be accessed by the browser
after login on the internet and then saved through the input handler which converts them to
suitable format for future reference in the database.
Visualizer
The visualizer is a software component that will programmed to carry out a number of functions
such asl fetching data from the database, converting it to CSV file format and using the columns
specified to plot graphs, charts and tables that the user will be able to view through the browser
presented as HTML files.
Web Application
The web application will be used by the user(Admin) to login into the system, the credentials to
use will be provided when the user buys the metre . This will enable him to access the
prediction model, visualise and do other functions that he has access to as specified.
9
3.2.1 Predictive model
This illustrates in fig3.2 the breakdown and how our two predictive algorithms are going to work
together to produce an accurate prediction outcome. It shows the steps that will be employed in
developing, calibrating and validating the models.
processed data is fed into the SARIMA model, whereas it is divided into two groups, target data
and input data for the hybrid model. For the hybrid model, the target data is fed into the SOM
model and the input data is sent directly to the RT model. In the SOM model, the target data is
clustered and the output cluster number accompanied with the target data is added to the input
data in the RT model. At this point, the RT model performs the prediction at time steps ahead.
After predicting the target, the performance of the model is assessed (i.e. compared to the held
back water demand data). If the performance is satisfactory, the model is implemented to
forecast t time steps ahead.
However, usually the desired performance cannot be obtained from the first trial. In this case,
more neurons can be added to the SOM model and/or more leaves and folds can be added to
the RT model
10
3.2.2 BPM for the predictive system
The Business Process Model Fig3.3 illustrates the flow of the water consumption prediction
model that we intend to employ in our project. The first three stages of this process are fully
described in the previous sections. The important aspect to note about this model is that first, if
the model finds that the selected date for prediction is a weekend, the model only chooses the
weekends for use in the training phase. That means the model will ignore all the weekdays for
the training phase and vice versa for the weekend. These selected dataset will be used in the
training and testing phases which will enhance accuracy in our prediction.
The last important aspect is about the last stage to achieve the most accurate and efficient
model, we will use hyperparameters in the evaluation step for each algorithm. This will help us
to improve our model evaluation results
11
Figure 3.2.1 the BPM diagram
12
3.2.3 Use Case diagram for SWAP system
The use case diagram above shows the user interaction with the system, with our two main
users, the first being the Adim ( the owner of the business ), he will be able to login to the
system, enter duration for which he wants the prediction and afterwards he will be able to view
the predicted value. He will also be able to view the different reports in a visualised form and
change water rates or charges.
The super admin on the other end can create new admin, login, view reports and historical data.
13
3.2.4 Class diagram for SWAP system
14
prediction, weekly prediction, the monthly and the custom prediction. The predictive results can
be viewed by the user and stored in the database after analysing the past water consumption
data in the analysis model.
15
4. DATA DESIGN
Entity Description
SuperAdmin The super admin details will store information about the super
administrator
SystemAdmin This table will store information about the vendor who is also
known as the system administrator
WaterMeter This table will store information about the water metres
TotalDailySales This table will store details about the daily total sales for each
water
DailyTransactions This table will store information about the daily transactions of
each water metre
WaterConsumption This table will store the daily water consumption details for each
water metre
PredictionResults This table will store all information about the predictions made for
every water metre
Table 2. Entity decscription
The super administrator is the one responsible for registering system administrators
Figure 4.1.1.1
16
A super administrator can also register new super administrators
Can create
SuperAdmin SuperAdmin
1_1 0_*
Figure 4.1.1.2
A system administrator can view the total daily sales of their metres
views
SystemAdmin TotalDailySales
1_1 1_*
Figure 4.1.1.7
17
Come from
TotalDailySales 1_1 0_* DailyTransactionDeta
ils
Figure 4.1.1.8
18
4.1.2 Entity Relation Diagram for the SWAP system
19
PredictionResults
Name Datatype Length Description
SuperAdmin
Name Datatype Length Description
SystemAdministrator
Name Datatype Length Description
TotalDailySales
20
Name Datatype Length Description
DailyTransactions
Name Datatype Length Description
meterId {FK}
WaterConsumption
Name Datatype Length Description
21
Table 8.
WaterMeters
Name Datatype Length Description
WeatherForecast
Name Datatype Length Description
22
5. COMPONENT DESIGN
Where,
Φ P (B S ): is the seasonal AR parameter of order P
φ(B): is the ordinary non-seasonal AR parameter
∇ DS : is the seasonal difference component ( ∇ S = {1 - B } )
∇ : is the ordinary non-seasonal difference component ( ∇ d = {1 – B} d )
X t : is the measured time series denoted by time t
δ: is the intercept
Θ Q (B S ): is the seasonal MA parameter of order Q
θ(B): is the ordinary non-seasonal MA parameter
W t : is the usual Gaussian noise process
23
The pseudocode above shows the algorithm used to determine these parameters and develop
SARIMA models. SARIMA seasonal and non-seasonal parameters will be estimated iteratively
through plotting the Autocorrelation Function (ACF) and the Partial Autocorrelation Function
(PACF). SARIMA is a simple traditional model that can be trained and fitted on a small dataset.
5.1.2.1 SOM
This model is also known as Kohonen Neural Networks (Kohonen 1982), is an unsupervised
learning technique that reduces data dimensionality. it uses competitive learning to
cluster input data into groups while preserving the topology and the distribution of the
input data. Simply stated, an n-dimensional grid of neurons compete to win data points
according to how close these points are in the input pattern. The patterns that are close
in the input space will be mapped to units that are close in the output space (i.e. grid)
(Bação et al. 2005).
24
Figure 5.1.1.2
a
The SOM model algorithm above shows the pseudocode algorithm that will be used to develop
SARIMA forecasting models. Processed target data is fed to the model where the model is
trained, tested, and validated. After reaching a satisfying performance, the time ahead input
data will be read to predict the response (i.e. the cluster number).
b
RT model algorithm. This figure shows the algorithm used to develop RT forecasting models.
Processed target data will be fed to the model where the model is trained, tested, and validated.
25
After reaching a satisfying performance, the time ahead input data will be read to predict the
response (i.e. the water demand
5.1.2.2 RT
26
5.2.1 Data Analysis
Our first main function is that the system will be able to give analysis of the past business
performance as far as past water consumption amount as well as sales is concerned in a
graphical format. For this to be accomplished, we will consider data structure as our first step
since it's the main foundation for every machine learning process. The chosen analytic
approach determines the Data Requirement, Data Collection, and Data Presentation
(Visualisation), this will however be broken down into two major modules i.e. data discovery and
data preparation and transformation as illustrated in the figure 1 below.
It should be noted that this process will only be performed at the start of the implementation and
an automated analytic algorithm will be used for subsequent data analysis.
27
We will access historical data directly from the database of one of the main smart water
metering vendors and convert it into a CSV file. We investigated what type of data we needed
and found out that their data could work for us. In this project, we require information about the
rate of water consumption daily based on date and time. Therefore, the required data are the
same as the properties of the recorded water data in the databases such as the volume of
consumption and amount paid that is considered a feature.
After the data requirement phase, our next step will be fetching this data and converting it into a
CSV file, where we shall measure and analyse accurate insight for our project using standard
validated techniques to ensure that the information is rich and reliable so that data-driven
decision decision can be made`
the modelling step.
This step kicks off immediately after data collection. Our focus here is on the attributes related
to the expressed problem solution in the study. This step will be carried out in both the analysis
and the prediction. Only appropriate data will be selected to aid higher accuracy in the
prediction. Thus, we will use the feature selection technique to prevent losing time and reduce
the quality of model performance. In our study, we will choose the date and time, the amount of
water consumption (Value), amount paid, weather condition of the location at that time and date
as the major significant features or variables.
When the data discovery phase is completed, we shall move to the transformation of the raw
data obtained from the discovery phase into data that can be used by the algorithms. This step
is a wandering trip for understanding the context of data until future predictions.
28
The most important, complicated and time-consuming task in machine learning projects is to
consider the data structure because collected data includes an unexpected range
of values, missing values, incorrect combination of data, among others. Data preparation is
required to transform the raw data into accurate and acceptable data for our use. Which makes
data pre-processing necessary for every machine learning project with data cleaning, data
reduction, data editing, and data wrangling steps.
We shall carry out Data wrangling (Data Munging) which is the process of manipulating data
illustrated in the figure 3 below. This involves changing the nature of the data example mapping
data, changing the data distribution, and changing the format of raw data to another form that is
more useful and worthwhile to be utilised in the analysis phase. Data Munging is an operation of
data normalisation, data aggregation, format updating, and data visualisation. We will consider
various types of data such as categorical or Continuous Variables in this system, some of the
independent variable such as the weekdays, weekends and so front will be categorical and
required the one-hot encoding, while the amount of water consumption (the dependent variable)
will not require the one-hot Encoding.
29
5.2.1.2.3 Data Manipulation (Feature Engineering)
We shall carry out data manipulation to ensure that we have a rich and valuable resource of
data. This method shall involve a number of small tasks such as dividing one column into
several columns, removing some columns, data aggregating, adding new columns as new
features, and so forth. Data enrichment is another type of data formatting that includes joining
data, connecting data, and adding data to limit data with basic.
Data visualisation or presentation is a simple display of trends and data patterns in charts,
30
graphs, or tables. A good presentation of data leads us to correct interpretation of the rela-
tionship between data. This allows us to have a correct analysis of the data to predict the
future. Therefore it is an important step in our project because the better the visualisation of
the data, the more data we can interpret and assess. Instead of looking through many rows
of data, we can look at the summary of data in a chart or a graph. Visualisation will helps us
understand the trend of data and transfer it simply and more understandable to others by a
simple and clear picture of what is happening in a huge dataset process. We shall consider
factors like time frame since it has a vital role in the trend of the dataset process and its
passage affects the dataset pattern. By considering the effect of the time factor along other
factors like weather, days of the week and so forth, we will be able to determine how much
water was used at defined time intervals. Various types of charts will be exploited to bring this
clear visualisation results such as histogram, bar graph, line chart, scatter plot, pie chart, heath
maps among others.
31
The user will interact with the system in-order to perform three main functions: enter
duration for which prediction needs to be revealed, see analysis in graphical forms of
the past performance of the business and see predicted future performance of the
system as far as capital is concerned.
1. Line graphs
This will enable the user to visualise trends in the performance of the business and
water consumption trends in a selected period of time and the area.
2. Bar graphs
This will enable the user to visualise the difference in water consumption levels of
selected areas or selected periods of time.
3. Pie charts
32
This will enable the user to visualise the areas with the greatest percentage of water
consumption and this will reflect an effect on the business growth.
4. Scatter plots
This will enable the user to visualise the general trends in which water is consumed
according to the selected period of time and this will reflect trends in the business
growth.
Location
This will contain the place name of the area being predicted for by the model
Amount of Water
This will represent the amount of water in litres
Time(Period)
This is either a month, day or year according to the user choice of view
Login
The user will also be provided with this feature in order to be authenticated into the
system.
The required fields are phone number, email or username and password.
33
6.2 Screen Images
This image shows the the filter functionality where the user can view predictions by the time
selected
34
Figure 6.2.1
This screen shows the dashboard of the system
35
Figure 6.2.2.
36
Figure 6.2.3
37
Figure 6.2.4
38
Graphical Visualis
Figure 6.2.5
39
Figure 6.2.6
40
Figure 6.2.7
41
6.3 Screen Objects and Actions
Login Screen
User Name
User names can range from 6 to 20 letters (numbers), as the industry standard. No special
characters, space.User can as well use their email and phone contact they received during
registration.
Password
Passwords can range from 6 to 20 letters (numbers), as the industry standard. No special
characters, space.
Submit
If the users enter the right username with the matching password, it will immediately take them
to the dashboard.
Cancel
If the user wishes to exit the program, hit the “Cancel” button
Dashboard
Here the user is presented with a summary of the current status of the system and the user can
click to view details.
Results filter
Dropdown
The user can select any option of time using the dropdown.
Save
This button allows the user to save the currently selected filter.
Cancel
This button closes the dialog in case of a second thought of the user.
After saving, the user is presented with the selected period data predictions.
Visualisation Screen
View Graphs
Here the user can only view the graphical data as selected by time or period on the results filter
screen.
42
Past Data Screen
View Tables
Here the user can only view the tabular data as selected by time or period on the results filter
screen.
43
7. REQUIREMENTS MATRIX
3.7.1.01 X
3.7.2.02 X
3.7.2.03 X
3.7.3.04 X
3.7.5.05 X
3.7.6.06 X
3.7.6.07 X
3.7.7.08 X
3.7.8.09 X
Table 11. Requirements matrix
8. APPENDICES
How the SOM model works
44
Figure 8.1 SOM model
This Fig 8.1 shows a 2-dimensional (2D) SOM structure with the number of neurons (N) equal 3.
This 2D network structure presents a brief introduction to the mechanism of the SOM. The
inputs are fed to the map where initial weights are assigned. Then, by calculating the Euclidean
Distance, the indices are arranged in clusters. The number of resulting clusters depends on the
number of neurons selected when the map is structured. Each index is won by one neuron.
45