Professional Documents
Culture Documents
College of Engineering
Engineering Systems Management Program
ESM 685: Capstone Course
Spring 2021
Instructor: Prof. Vian Ahmed
Group B
2. Research Methodology 6
2.7 Limitations 21
2.9 Ethics 24
3. Conclusion 25
4. References 27
List of Figures
List of Tables
While the previous sections describe the different stages of waste management and the
challenges faced in UAE to achieve their sustainable goals in the waste management
category, data on waste generation is essential to understand the very need for waste
management. Hence, the waste generation forecasts and the factors that influence waste
generation would provide better insights towards effective waste management plans and
strategies. Therefore, this section aims to highlight the various factors that influence waste
Identifying these factors is challenging and the most important considerations in MSW [2]
forecasting. On reviewing the literature, many of these factors are mentioned in [1-6] Several
studies develop models or algorithms that have used these factors for waste forecasting.
However, many of them could not address the waste generation by different activities and
from different social groups. Based on the existing studies, the factors studied can be overall
waste management measures[14,15], and local policy and regulations [7,15]. These factors
A study by (Zhu et al., 2008) described that waste management measures involve facilities
such as dumpsites, collector bins, transport facilities, recycling facilities, and policy
regulations were determined to understand if there was any impact due to certain fees (for
disposal, incineration) or reward system for recycling that are enacted in the region. The
study of (Milea 2009; O'Connell, 2011) described that general socioeconomic factors which
are considered as influencing factors of MSW generation are GDP, urban Population, urban
paved roads, per capita consumption expenditure, energy consumption, geographical location
of area are effective waste management techniques [2,15]. Previous studies have shown that
the combined factors of GDP and urban population growth are the most important
These studies included socioeconomic factors such as GDP, urban population, paved roads,
per capita consumption expenditure, and energy consumption are strong waste management
approaches. Studies that have found that GDP and urban population expansion are the main
The investigation in this research also describes that facilities such as landfills, waste disposal
facilities, waste collection facilities, transport facilities, recycling facilities, and municipal
waste disposal and recycling regulations all contribute to understanding whether or not there
is any influence of certain fees (for disposal, incineration) or a recycling reward system that is
put in place in the area. The previous studies and concepts will be kept in the view while
In other words, in several regions, MSW has increased with the rapid growth in population
and rapid urbanization. Similar to the researchers in [6,7] who developed models, based on
policy variables influencing the rate of solid waste generation, there are very few studies that
we're able to consider all these factors due to the inaccessibility of data, time consumption,
etc. Hence, it's very evident that most scholars have considered socioeconomic factors, such
as population size or economic status, such as GDP. The data of the latter were more
studies conducted in the field of waste indicator influence in UAE, these papers have enabled
shortlisting of the possible waste indicators to be evaluated with respect to the emirate of
Dubai. At the First Annual Waste Management Conference in Dubai, the UAE (Al-Sayigh
1993) presented a paper on recycling and composting, organizing, and retrieving data.
Storage and transportation and waste disposal. the overall effectiveness of the system was, in
his opinion, the following: much of the financial Improvement was made in the waste
management papers, fig.1 reflects the adopted machine learning models in solid waste
treatment and their percentage. Organic solid waste treatments, which often include recycling
and composting, often have limitations, such as low effectiveness, low reliability, high
expense, and the possibility for environmental pollution. The amount of interest in machine
learning to solve organic solid waste treatment's challenging difficulties has grown in the
previous decade. This topic of study is severely lacking in a comprehensive review of the
findings. In his study [22] states about classifying machine learning studies from 2003 to
2020 and summarising the machine learning model suitability for various application
domains, as well as the machine learning model's applicability's limitations and future
possibilities. Based on his study, the research conducted with regards to municipal solid
waste management was the most prevalent, with additional research done on anaerobic
digestion, thermal treatment, composting, and landfill. An artificial neural network (ANN) is
one of the most extensively employed models for tackling a wide range of non-linear organic
solid waste challenges. Studies adopting Artificial Neural Networks (ANNs) account for
Algorithm with (9%), and Random Forest along with Decision tree analysis (7%). Other
models such as Multiple Linear Regression (MLR), K-nearest neighbor (KNN), adaptive
network-based fuzzy inference system (ANFIS), gradient boosting machine(GBM), gradient
boosting regression tree (GBRT), GEP, and KMC account for 15% of the total. Although
conventional treatment and recycling methods for solid solid waste have inherent
shortcomings, such as low performance, low accuracy, high cost, and environmental risks,
they are often preferable to alternative approaches with no advantages.[66] As the application
of machine learning to the management of organic solid waste has grown in prominence over
the past decade, so has its use in tackling the growing number of difficult problems.
However, extensive research has been done, but the literature lacks a comprehensive analysis
of findings. This report compiles and summarises all research papers published between 2003
and 2020, outlining their respective implementation areas, features, and applicability of
Additionally, it assesses the strengths and weaknesses of the proposed solutions, as well as
predicting potential prospects. Municipal solid waste management studies accounted for the
vast majority of research done in this region, with anaerobic digestion, thermal treatment,
composting, and landfill making up the rest. An artificial neural network (ANN) is the most
commonly used model in the field of nonlinear organic solid waste (NOLSW) since it has
The study of [22] is limited to these operational and organizational issues. A major constraint
in the public and policymakers' general lack of knowledge and education of the climate.
Thus, it merely compounds the issue of waste management, making it even more
integrated, complex, yet adaptive semantic web-based modeling systems if any genuine
machine learning approaches, this research would opt for the two widely used approaches.
Artificial neural networks (ANN) and support vector machines (SVM) and a simpler model,
multiple linear regression model, predict the waste forecasts for the yeast 2021-2030 in
Dubai.
2. Research Methodology
Based on the literature review findings and the research questions:
quantity of water consumed, total buildings under construction, and the number of
The factors such as the quantity of water consumed, total building under construction, and the
number of visitors and waste generation in Dubai are also significant factors of waste
production. The methodology of the research will be discussed in the following sections. It
will focus on different methodology aspects, which are the research approach, strategies, and
research methods used to achieve the aim and objectives of the research. The research aims to
identify the factors that influence waste management and how effective and sustainable waste
management techniques can reduce management. It will also discuss the data collection
techniques and data analysis methods. In addition to these, ethical considerations practices
Based on the discussed main findings, it is understood that there is a lack of detailed studies
in UAE regarding waste management; however, with the boosting economy, the solid waste
generated continues to increase. Therefore, this section aims to briefly highlight the reasons
Consisting of seven emirates in total, UAE's Population is varied across each of them, with
Abu Dhabi, Dubai, and Sharjah having the most [23]. Over the recent years, the Emirate of
Dubai has noticed substantial economic growth. Similarly, changes in the field of waste
management have also been observed. However, the primary challenges currently faced in
UAE include the lack of research conducted in this field and the lack of effective and
This study intends to examine the relationship between a few major indicators, such as GDP
and Population, with the waste generated for each year in the region. “The value of all
products and services generated inside the borders of a country throughout a year is known
as GDP. The growth rate of the gross domestic product (GDP) is a vital measure of a
country's economic health.” GDP increased by 1%, leading to a rise in municipal service of
1.76% on average as domestic waste. This estimation was relevant to 5% of the population.
When population growth rises by 1%, it would likely result in a 0.11% rise in municipal
waste. This estimation was necessary at the 95% confidence level. Over the past 10,
population and GDP in emerging destinations has grown at a rate of 4.4% a year whereas
established economies' growth will be just 2.2% a year. [67] Health is dependent on
socioeconomic factors such as jobs, education, and income. Socioeconomic issues include
interdependent. Therefore, GDP and population play an essential role in the production of
waste. After which, a forecast model for solid waste generation in the area for the next ten
years, 2021-2030, would be studied to understand the need to enforce sustainable methods
sooner, using a few machine learning approaches. To conduct this research, consistent data is
critical for a fruitful study. Factors such as quantity of water consumed, total buildings under
construction, and the number of visitors and waste generation in Dubai are considered in
research as a variable that will be discussed and focused. Solid waste and its types
referred to as municipal solid waste (MSW). Food waste is the biggest component. This
includes paper, plastic, rags, metal, and glass. Debris from demolition and construction, like
wood and metal, is frequently included of waste that has been put in collection, as well as
minor quantities of toxic and hazardous trash, such as light bulbs, batteries leftover
Other forms of solid waste is industrial waste and agricultural waste production.
Therefore, the analysis on the various waste indicators and forecasts will be conducted for the
emirate of Dubai only because the database and sampling collection is only limited to Dubai.
In other words, this research intends to explore the answers to the following questions:
quantity of water consumed, total buildings under construction, and the number of
Based on these four research questions, the research approach, in general, is a mixed
approach. The first two research questions are approached through the literature review and
explained using quantitative data, which would be discussed in the data collection section. It
is examined using multiple linear regression models and then forecasted using three suitable
phenomenon. [22] It is essential to indicate the type of the research philosophy to identify the
research design suitable for the research to find answers for the research questions. Based on
the research questions, the best research philosophy suited is a positivist philosophical stance.
In a definition, positivism states that only "verifiable" information, which can be obtained
through the senses (such as by measuring), is accurate. The researcher's position in positivism
Observable and quantifiable scientific results are also the product of research. The core of
positivism is derived from facts and figures that contribute to statistical analysis. It is the
empiricist view; positivism is in agreement with positivism. Awareness comes from our
experiences as human beings. It believes the universe is made up of small, basic particles
components and occurrences that communicate logically, which can be observed and regular,
or ordinary, method. The positivist approach will capture the quantitative methods to
visualize patterns, work on quantifiable observations, and obtain some statistical analysis.
[26]
The research aims to identify the factors that contribute to waste generation and develop a
forecasting model to predict the waste generated in the UAE based on the identified factors.
According to the nature of the research problem, the most suited research approach is the
mixed approach, in which both explanatory and exploratory methods are adopted. The
exploratory approach allows for the exploration and identification of the factors contributing
superior comprehension of the current issue; however, it doesn't prompt a real outcome for
the most part. Therefore, analysts utilize an exploratory approach to acquire knowledge of a
current phenomenon and gain a new understanding to frame a more exact issue [24].
It starts dependent on an overall thought, and the approach's results are utilized to discover
related issues with the subject of the exploration. For example, in an exploratory approach,
the interaction of the exploration fluctuates as indicated by the finding of new information or
of this approach give answers to questions like what, how, and why [25]. Then, this approach
● How can machine learning techniques be applied in different stages of solid waste
management?
On the other hand, the explanatory approach helps this research by establishing causal
relationships between variables; in other words, this approach supports our concern to assess
how one variable is responsible for changes in another variable. [27[ An example could be
finding the causal relationship or correlation between the amount of waste and the population
generation in Dubai?
To answer this question, the below hypotheses are constructed to determine if the
independent variables, i.e., socioeconomic factors, can be considered as waste indicator(s) for
Dubai. The following hypotheses are created and must be tested in this research;
H01: Gross Domestic Product significantly affects the waste generation in Dubai.
H02: Population growth rate significantly affects waste generation in Dubai.
H03: The quantity of water consumed significantly affects waste generation in Dubai.
H04: Total buildings under construction significantly affects the waste generation in Dubai
H05: Total number of visitors significantly affects the waste generation in Dubai.
Overall, this research approach adopts a mixed approach, adopting both deductive and
hypotheses are developed based on the data. Then, to draw conclusions, patterns,
resemblances, and regularities in experience (premises) are observed (or generating theory).
In deductive reasoning, a general premise leads to a particular inference. Moving from the
general to the specific is known as top-down thought. We can identify and deduce from the
literature the factors that influence the waste generation and how machine learning is valid in
solid waste management through the deductive approach. Whereas the inductive approach
supports devising new findings, such as the waste indicators for Dubai and the waste
generation forecast, based on the results generated from machine learning which would
generate new data that allows for decision making in reality. [63]
the waste amount already available at online resources. This research comprises two parts; in
the first part, the researchers need to explore "How" and "What" impact or factors influence
the waste amount. The research will need to predict the waste amount for the next 10 years
(2021-2030) in the second part. This implies that this research requires evaluating the effect
of more than independent variables on a dependent variable. Therefore the most suited
begins after the fact has occurred without interference from the researcher. It is a type of
research design known as after-the-fact research involves research in which the study
commences after an event has occurred with no researcher influence. Social research is
almost entirely focused on retrospective studies in which there is no way to change the
follow the complete research process of a bona fide experiment. Despite analysing historical
facts, post facto research shares some of the underlying logic of inquiry used in experiments.
[62]
Experiments enable us to pursue this research. Each factor is tested if it has any correlation to
the generated waste, thus choosing the waste indicators for Dubai accurately. In the world of
relationship between two variables. Other common correlation coefficients include Pearson's.
This is the correlation coefficient that you can see in linear regression equations. Pearson's R
is the first statistics concept you can understand while you're getting started. Pearson's
correlation coefficient is almost always being used when someone refers to the correlation
The process of collecting the data and its analysis for the qualitative and quantitative study
for this research will be discussed in this section. Based on the Research nature, research
strategy, and research question, the Quantitative data collection method is used for collecting
the data. This method is suitable for this research because the available sources are literature,
knowledge from the literature review on whether and how machine learning is valid in solid
waste management and its applications in the same field. This qualitative study would benefit
the researchers to navigate further and shortlist their approach to analyze data for the third
The second research question calls for finding the various waste indicators; hence, deducing
the various possible factors known to influence solid waste generation comprises the second
phase. The researchers dive and explore studies conducted in other cities and/or countries, as
highlights the various waste indicators, a quantitative data collection method is applied to
induce or collect information to justify the same for UAE and allow further analysis. The
statistical data for the derived factors would be sought after with respect to Dubai to identify
if there's any correlation to the waste generation for the same period. This calls for all the
chosen variables to have sufficient data for the periods same as the waste generation data for
Dubai. The period for which quantitative data would be collected is from 2000-2020. The
waste indicators and the waste generated would help the researchers evaluate and interpret
the correlation between each indicator and the waste generation for each year. A mere
qualitative data would not be sufficient to interpret if these factors influence solid waste
generation in Dubai.
After evaluating the various factors, the last phase, which is the core functionality of this
research, includes using the most related waste indicators to forecast the trend in solid waste
Data
Arrangement
Data
Analsis
Evaluation
Before implementing the machine learning models, some preprocessing techniques to clean
the dataset would be conducted. Then, according to the study, there may be some data
transformation. Finally, the Linear Regression, Support Vector Machine, and Artificial
Neural. The machine learning algorithm is called linear regression and is used in supervised
learning. This does a regression calculation. The regression model projects a predicted value
on the basis of many factors. While forecasting is the most common use, it is mostly used to
investigate how variables are connected and to forecast. SVM is a supervised machine
learning technique that can be used for classification or regression tasks. It is usually utilized
computational system that loosely resembles a biological neural network, and is commonly
referred to as a neural network. ANNs are made up of nodes that function like artificial
neurons, which approximate the connections and functions of neurons in a biological brain.
Neural networks function well with linear and nonlinear data, however because of the vast
diversity of training required for real-world functioning, the systems that employ neural
networks are commonly met with a fair amount of criticism. A machine learning algorithm
will only be able to understand the underlying structure that permits it to generalize to new
implemented while using forecasting techniques and evaluating the machine learning
Note: The rest of the information will be discussed in the next section 2.6.2.3
[28], for research and other applications regarding the UAE in general. However, only two of
the seven emirates, namely Abu Dhabi and Dubai, have their statistical centers [29] and [30],
respectively. The major drawback of the federal statistical centre is that since they were
founded in 2015, as per the UAE Federal Law 2/2020, their data with regards to waste is not
available for much earlier than 2010, and the desired data for few indicators such as current
GDP, is not available for years prior to 2009. However, UAE data banks on the World Bank
portal [31] have data for most of the indicators for the desired year range and lack the waste
datasets. Therefore, the unavailability of sufficient waste datasets was the major reason for
eliminating this research on UAE as a whole and hence dotting either the emirate of Abu
The statistical centres for Abu Dhabi and Dubai have been functioning over several years.
Although Dubai is older and hence has wider data sets for the desired waste indicators and
the waste generated, it led to choosing the emirate of Dubai to pursue this research.
Data collected from the Dubai Statistics Center & the official open data portal with the name
of the waste dataset. This dataset is freely available on this platform and can be used in
machine learning data analysis. Dataset would consist of 7 worksheets related to the total
waste amount, gross domestic product, Water consumption, building under construction,
Therefore, in order to evaluate the factors and the solid waste forecasting for Dubai, historical
data is primarily used from the yearbook, quarterly statistical reports, and other publications
issued by the Dubai Statistics Center and their official open portal. [90] Suppose there are any
gaps in data, such as for "Total collected wastes" for a particular year, in order to fix the
problem. In that case, there will be some approximations for those values based on values
preceding and following the required value. For example, if an estimation is missing for a
year, say 2004, it will be computed as an average of 2003 and 2005. [58]
this process, the goal is to try to understand the dataset according to the research study.
The first step is to handle the missing values, irrelevant values, and NA's values. Machine
learning models need data in the numerical form, so it is necessary to transform text data in
numerical form with the help of available natural Language Processing (NLP) approaches if
required [33].
model with a dataset [34]. Therefore it is required to apply some data cleaning techniques for
handling missing values. First, all missing values will be replaced with the average value for
numerical data, and the missing values will be removed from categorical data. After this, the
model will be implemented in RapidMiner so; the RapidMiner tool is used to remove NA
values or missing values. RapidMiner supplies data mining and machine learning processes,
including data loading and transformation (ETLs), data preprocessing and visualization,
learning model will be implemented to analyze waste amount data for predicting the next 10
years' trend. Three machine learning models would be used in the implementation process,
such as Linear regression, Support Vector Machine, and Artificial Neural Network. These
machine learning methods were used to train the forecasting model and evaluate the model
performance based on the literature review and other sources [37,38]. More details of these
Based on the dataset, the researchers will analyze and visualize the forecasting trend. In other
words, identifying the future trend of the waste amount in UAE, from all past combinations
between them, a correlation matrix could be used , and other features relation techniques.
With the help of association or correlation finding techniques, one can determine which
variables or features are most important to implement the models for best accuracy and
Implemented machine learning models using Multiple Linear Regression, SVM, and
Artificial Neural Network(ANN) for the collected data will use 70% for training data and
30% testing data for evaluating the model performance [38]. However, different
combinations of the percentage of training and testing the data could lead to better results.
Multiple linear regression is a statistical technique that uses various explanatory variables to
predict the outcome of a response variable. Multiple regression is a linear (OLS) regression
extension that uses only one explanatory variable. An SVM is a monitored learning machine
that uses classification algorithms for two-group classification problems. They can categorize
new text after giving an SVM model collection of labeled training data for each group. An
artificial neural network (ANN) is a computer device designed to replicate the analysis and
standards. [61]
This section intends to discuss the process of analyzing the quantitative data in order to
address the third and fourth research questions of examining if the shortlisted socioeconomic
factors have any correlation and compare their results to identify the waste indicators.
Moreover, the applicable waste indicators will predict the waste generation from 2021-2030,
using three different machine learning models. Those models would be created and analyzed
consumption, annual tourists visited, number of buildings under construction. And how these
selected factors affect the quantity of waste generated, the researchers will test the stated
hypothesis using a scatter plot and Pearson's correlation analysis. Correlation factor analysis
would provide information on the strength and direction of the linear relationship between
each factor and the waste generated. A correlation coefficient or analysis is used to find
relationships between variables or features of the dataset. This research is needed to find out
To begin with the analysis, the plotting of the xy scatter plot for "Total waste amount" with
other variables must be done in order to visualize the trends and gain a general overview of
the dependence between them. Following this, the calculation of "Pearson correlation"
between each pair of variables will proceed in order to find which factor has the major
influence on waste generation. For example, using these methods (scatter plot and
correlation) for shifted variables, the researchers would check the correlation between the
Finding dependence is getting some metrics in order to know that one's dependence "is
better" than another. To perform that, it is needed to compute the below defined Pearson
All the above steps will be repeated for each factor with a shift over time. For this analysis,
based on the hypothesis H01-H05, the following pairs of variables will be computed for
correlation :
On computing the correlation factor for each pair, based on the basic principle [40], the tested
hypothesis will be rejected if the coefficient factors are below 0. However, the factors that
have the highest correlation value will be chosen as the waste indicators that influence waste
generation in Dubai.
In the machine learning forecasting technique used for finding future trends based on past
data. According to the collected data requirements, we need to use these techniques for
predicting future waste amount trends.RapidMiner provides the default forecasting model
windowing models [41]. But we can change the specific model for applying forecasting on
collected data. Linear Regression, SVM and Artificial Neural Network model the three
models with the applied forecast on the waste amount data. By default, the RapidMiner
ARIMA model generates the forecasts for the next ten values [42]. Therefore, the default
model can be used. However, a need to tune the forecast models should be applied while
Note
The linear Regression model is mostly used for modeling the relationship between dependent
and independent variables or features [43]. This research study predicts the waste amount and
finds the main factors or indicators that influence the quantity of waste. That's why this
model will help to predict the relationship between variables or factors. In the collected data,
there are many independent variables such as the Gross Domestic Product (GDP), the
quantity of water consumed, population, total buildings under construction, and the number
of visitors, in addition to one dependent variable, which is the variable of our interest which
is the total collected waste. The variable (factor) predicted for which the equation solves is
called the dependent variable. While the independent variables are the factors used to predict
the value of the dependent variable [44]. The following equation could present the simple
y = α + βX [45]
The equation has two important factors, α, which is the y-intercept of the regression line,
while β is the slope and y is the dependent variable. The regression line could be positive,
negative, or no relationship. If the graphed line has no slope (just a flat line), there is no
relationship between variables. A positive relationship exists when the regression line slopes
upward. In contrast, a negative linear relationship exists if the regression line slopes
downward [46]. From knowing the general trend line between the multiple independent
variables and the dependent variable, we can identify the relationship's strength. Also, it helps
us to understand the effect of the dependent variable on other independent variables. Lastly,
the regression analysis helps in predicting trends and future values. All of these will be
Support Vector Machine (SVM) is one of the popular machine learning algorithms and is
considered one of the most robust and accurate methods among the well-known data mining
algorithms [47].
The capacity of SVM to tackle nonlinear regression assessment issues makes SVM valuable
in time series forecasting [48]. It has become an intriguing issue of escalated concentrate
because of its valuable application in classification and regression models. While using
RapidMiner, we can use this machine learning model before applying the forecast. Support
Vector Machine provides us with a model tune mechanism if the model's output is not
according to the requirements. We can tune the hyper plan parameters while using
RapidMiner studio and apply them to the predicting forecasting model [49].
[50]. Also, this model can be applied for short datasets because ANN generates output by
Artificial Neural Network is a deep learning modelling approach, and this model facilitates
analysis. Artificial Neural Network based on the input, output, and hidden layers. The beauty
of this model is it provides the most relevant results compared to other models [52]. The
structure of a neural network algorithm has three layers, as shown in figure 3; the input layer,
which feeds the data values into the next layer (hidden layer), the hidden layer contains
several complex functions that create predictors, those mathematical functions are hidden
from the user. Their role is to modify the input data and make predictions; these functions are
also called neurons. Finally, the output layer has the role of collecting the hidden layer's
predictions and producing the result, which is the model's prediction. [53].
2.7 Limitations
The literature review benefits the researchers to understand the possible waste indicators for
Dubai, as there are no previous studies or findings conducted in the emirate. Given that the
research deals primarily with statistical data, data gaps are also a very common case in
statistics. Knowing that the region does not have extensive historical records, having gaps in
Population data published after 2005 are primarily a population estimate and were later
validated using the number of people with residential status in Dubai due to employment.
Still, it is understood that they are not necessarily residents of Dubai, which might have a
negligible impact.
The waste generation forecast would be based on the historical data of the waste indicators
and their projections. Hence the predictions of the models would be computed as the period
of the forecast over one year. For example, when predicting waste generation for the year
2025 using population size from previous years, it is needed to estimate the population size in
2021, 2022, 2023, and 2024. Therefore it would be beneficial to perform comparative multi-
model forecasting than using a single model for better prediction accuracy.
While the objective of this research inclines towards an improvement, to provide beneficial
hopefully provide a source of study for future researchers. With a detailed research design,
the reliability and the validity of the research data and the consequent research is of utmost
importance as it aids in evaluating the quality of the study. Therefore, this section describes
how the researchers aim to measure the reliability and validity of their research design.
instrument's reliability or the precision of the measuring instrument. The word "reliability"
has two meanings: first, whether or not you can get the same response each time you repeat a
measurement, and second, how trustworthy a measurement result is. Most simply, study
Construct validation enables the researchers to understand how the research data and
approach are fairly representative of the entire research the researchers seek to measure. [57]
waste generation. After that, develop a forecasting model to predict the waste generated in
Dubai based on these identified factors. Table 1 provides a comprehensive summary of how
the research questions are mapped and how their approaches and outcomes are valid.
To ensure the validity of the above-discussed data, it is essential to test the reliability of the
data as well. To do so, one of the most commonly used assessment tools is to calculate the
internal correlation coefficient, Cronbach's Alpha for each variable's dataset, including the
waste generated. The below formula [40] shows how to compute this coefficient, α, for the
datasets:
Here N is equal to the number of items, in this case, the number of years, c, is the average
inter-item covariance among the items for a variable's dataset and υ Equals their average
variance. The Cronbach alpha coefficient would be computed for each variable, and to
consider them as an acceptable, reliable dataset, the coefficient value must be 0.70 or higher.
These measurement techniques improve the quality of the research and provide sound
quantitative research for the beneficiaries by giving consistent and accurate research.
2.9 Ethics
The data collection process is one of the key phases of any research. Therefore it is also
suggested to have a good practice of prioritizing ethical principles throughout the research
itself. Being sensitive and having credible research can drive the research fundamentally as
well.
A few major ethical considerations that must not be overlooked are professional integrity and
accountability, the integrity of the data and methods, and the responsibilities of research
colleagues and instructors. Ethical considerations are one of the most critical aspects of the
study. Experiment participants should never be placed in any kind of danger. Respect for
research participants' integrity should be considered a significant value. [28] We must first
obtain full consent from the participants before starting the study. Given that this research
comprises a mixed approach, there are situations where the research is approached in
exploratory and explanatory studies. Hence, the researchers need to conduct an impartial and
transparent assessment of the findings and acknowledge all sources of findings and data used
to fulfill this research. Moreover, the researchers are obliged to ensure that their research
practices comply with the intention of the statistical data sources and do not have any
conflict. It is also essential for the researchers to acknowledge the data editing procedures,
including any imputation and missing data mechanisms, thus striving to promote
Above all, with professionalism during the research phases, the researchers vouch to act in
good faith and manner and strive towards successful research. Therefore, although my study
will not be collating data directly from subjects such as humans or animals, the research
3. Conclusion
This report focused on identifying all aspects of the research methodology, including research
philosophy, research design, research approach, and strategy. It also discussed the analysis
methods used in the research depending on the research questions to analyze the available
The most available and accessible data was the population characteristics, GDP, water
consumption, number of houses being built, and the total number of tourists who visited the
emirate and waste generated statistics of Dubai from Dubai Statistic Center (SDC).
Therefore, this research would primarily focus on analyzing the influence of the shortlisted
waste indicators over the years 2000 to 2020 to forecast the waste generation, if found to be
correlated, using various appropriate algorithms in ML. Furthermore, from the literature, it
was found that the commonly adopted machine learning model used in the field of solid
waste treatment is the Artificial Neural Networks (ANNs). Therefore, this research will be
implementing machine learning approaches such as linear regression technique, SVM, and
ANN to approach the expected results. Also, these results and findings will help support and
[1] R. Afroz, K. Hanaki, and R. Tudin, "Factors affecting waste generation: a study in a waste
management program in Dhaka City, Bangladesh," Environmental Monitoring and Assessment, vol.
179, no. 1–4, pp. 509–19, Aug. 2011, doi: 10.1007/s10661-010-1753-4.
[2] Chen Liu and X. Wu, "Factors influencing municipal solid waste generation in China: A multiple
statistical analysis study," Waste Manag Res, vol. 29, no. 4, pp. 371–378, Aug. 2010, doi:
10.1177/0734242X10380114.
[3] L. Sokka, R. Antikainen, and P. E. Kauppi, "Municipal solid waste production and composition in
Finland—Changes in the period 1960–2002 and prospects until 2020," Resources, Conservation and
Recycling, vol. 50, no. 4, pp. 475–488, Jun. 2007, doi: 10.1016/j.resconrec.2007.01.011.
[4] M. Sharholy, K. Ahmad, G. Mahmood, and R. C. Trivedi, "Municipal solid waste management in
Indian cities – A review," Waste Management, vol. 28, no. 2, pp. 459–467, Jan. 2008, doi:
10.1016/j.wasman.2007.02.008.
[5] C. Ghinea et al., "Forecasting municipal solid waste generation using prognostic tools and
regression analysis," Journal of Environmental Management, vol. 182, pp. 80–93, Nov. 2016, doi:
10.1016/j.jenvman.2016.07.026.
[6] D. Grazhdani, "Assessing the variables affecting on the rate of solid waste generation and
recycling: An empirical analysis in Prespa Park," Waste Management, vol. 48, pp. 3–13, Feb. 2016,
doi: 10.1016/j.wasman.2015.09.028.
[7] P. Beigl, S. Lebersorger, and S. Salhofer, "Modelling municipal solid waste generation: A
review," Waste Management, vol. 28, no. 1, pp. 200–214, Jan. 2008, doi:
10.1016/j.wasman.2006.12.011.
[8] S. Hong, R. M. Adams, and H. A. Love, "An Economic Analysis of Household Recycling of Solid
Wastes: The Case of Portland, Oregon," Journal of Environmental Economics and Management, vol.
25, no. 2, pp. 136–146, Sep. 1993, doi: 10.1006/jeem.1993.1038.
[9] S. Keser, S. Duzgun, and A. Aksoy, "Application of spatial and non-spatial data analysis in
determination of the factors that impact municipal solid waste generation rates in Turkey," Waste
Management, vol. 32, no. 3, pp. 359–371, Mar. 2012, doi: 10.1016/j.wasman.2011.10.017.
[10] J. Mateu-Sbert, I. Ricci-Cabello, E. Villalonga-Olives, and E. Cabeza-Irigoyen, “The impact of
tourism on municipal solid waste generation: The case of Menorca Island (Spain),” Waste
Management, vol. 33, no. 12, pp. 2589–2593, Dec. 2013, doi: 10.1016/j.wasman.2013.08.007
[11] C. Wang, M.-D. Lin, and C. Lin, "Factors Influencing Regional Municipal Solid Waste
Management Strategies," Journal of the Air & Waste Management Association, vol. 58, no. 7, pp.
957–64, Jul. 2008.
[13] M. Mazzanti and R. Zoboli, "Waste generation, waste disposal and policy effectiveness:
Evidence on decoupling from the European Union," Resources, Conservation and Recycling, vol. 52,
no. 10, pp. 1221–1234, Aug. 2008, doi: 10.1016/j.resconrec.2008.07.003.
[14] H. Bach, A. Mild, M. Natter, and A. Weber, "Combining socio-demographic and logistic factors
to explain the generation and collection of waste paper," Resources, Conservation and Recycling, vol.
41, no. 1, pp. 65–73, Apr. 2004, doi: 10.1016/j.resconrec.2003.08.004.
[16] T. Getahun et al., "Municipal solid waste generation in growing urban areas in Africa: current
practices and relation to socioeconomic factors in Jimma, Ethiopia," Environmental Monitoring and
Assessment, vol. 184, no. 10, pp. 6337–45, Oct. 2012, doi: 10.1007/s10661-011-2423-x.
[17] D. Khan, A. Kumar, and S. R. Samadder, "Impact of socioeconomic status on municipal solid
waste generation rate," Waste Management, vol. 49, pp. 15–25, Mar. 2016, doi:
10.1016/j.wasman.2016.01.019.
[18] F. Bartolacci, A. Paolini, A. G. Quaranta, and M. Soverchia, "Assessing factors that influence
waste management financial sustainability," Waste Management, vol. 79, pp. 571–579, Sep. 2018,
doi: 10.1016/j.wasman.2018.07.050.
[19] J.Cheng, F. Shi, J. Yi, and H. Fu, "Analysis of the factors that affect the production of municipal
solid waste in China," Journal of Cleaner Production, vol. 259, p. 120808, Jun. 2020, doi:
10.1016/j.jclepro.2020.120808
[20] M. R. Alavi Moghadam, N. Mokhtarani, and B. Mokhtarani, “Municipal solid waste
management in Rasht City, Iran,” Waste Management, vol. 29, no. 1, pp. 485–489, Jan. 2009, doi:
10.1016/j.wasman.2008.02.029.
[21] L. Chhay, Md Amjad Hossain Reyad, R. Suy, M. R. Islam, and M. M. Mian, "Municipal solid
waste generation in China: influencing factor analysis and multi-model forecasting," The Journal of
Material Cycles and Waste Management, vol. 20, no. 3, pp. 1761–1770, Jul. 2018, doi:
10.1007/s10163-018-0743-4.
[22] H.-nan Guo, S.-biao Wu, Y.-jie Tian, J. Zhang, and H.-tao Liu, "Application of machine learning
methods for the prediction of organic solid waste treatment and recycling processes: A review,"
Bioresource Technology, vol. 319, p. 124114, 2021.
[25] Swedberg, R. (2020). Exploratory research. The production of knowledge: Enhancing progress
in social science, 17-41.
[26] R. Bevans, “A quick guide to experimental DESIGN: 4 steps & examples,” 02-Apr-2021. [Online].
Available: https://www.scribbr.com/methodology/experimental-design/. [Accessed: 13-Apr-2021].
[29] Navarro-Esbrı, J.; Diamadopoulos, E.; Ginestar, D. Time series analysis and forecasting
techniques for municipal solid waste management. Resour. Conserv. Recycl. 2002, 35, 201–
214
[32] Claveria, O., Monte, E., & Torra, S. (2017). Data preprocessing for neural network-based
forecasting: does it really matter?. Technological and Economic Development of Economy, 23(5),
709-725.
[33] Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing:
an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
[34] H. Kang, "The prevention and handling of the missing data," Korean Journal of Anesthesiology,
vol. 64, no. 5, p. 402, 2013.
[35] Asuero, A. G., Sayago, A., & Gonzalez, A. G. (2006). The correlation coefficient: An overview.
Critical reviews in analytical chemistry, 36(1), 41-59.
[36] Benesty, J., Chen, J., Huang, Y., & Cohen, I. (2009). Pearson correlation coefficient. In Noise
reduction in speech processing (pp. 1-4). Springer, Berlin, Heidelberg.
[37] Cinar, Y. G., Mirisaee, H., Goswami, P., Gaussier, E., & Aït-Bachir, A. (2018). Period-aware
content attention RNNs for time series forecasting with missing values. Neurocomputing, 312, 177-
186.
[38] Mohammed, M., Khan, M. B., & Bashier, E. B. M. (2016). Machine learning: algorithms and
applications. Crc Press.
[39] Cleophas, T. J., & Zwinderman, A. H. (2018). Bayesian Pearson Correlation Analysis. In
Modern Bayesian Statistics in Clinical Research (pp. 111-118). Springer, Cham.
[40] Cleophas, T. J., & Zwinderman, A. H. (2018). Bayesian Pearson Correlation Analysis. In Modern
Bayesian Statistics in Clinical Research (pp. 111-118). Springer, Cham.
[41] Massaro, A., Maritati, V., & Galiano, A. (2018). Data Mining model performance of sales
predictive algorithms based on RapidMiner workflows. International Journal of Computer Science &
Information Technology (IJCSIT), 10(3), 39-56.
[42] Fattah, J., Ezzine, L., Aman, Z., El Moussami, H., & Lachhab, A. (2018). Forecasting of demand
using ARIMA model. International Journal of Engineering Business Management, 10,
1847979018808673.
[43] Schowe, B. (2011, June). Feature selection for high-dimensional data with RapidMiner. In
Proceedings of the 2nd RapidMiner Community Meeting And Conference (RCOMM 2011), Aachen.
[44] Han, J., & Kamber, M. (2007). Shu ju wa jue: Gai nian yu ji shu, di 2 ban = Data mining:
Concepts and techniques, 2nd ed. Beijing: Ji xie gong ye chu ban she.
[45]Gupta, S. (2015). A regression modeling technique on data mining. International Journal of
Computer Applications,116(9), 27-29. doi:10.5120/20365-2570
[47] V. Kumar. (Ed.), The top ten algorithms in data mining, Taylor & Francis Group, New
York (2009)
[48] Zhao, H., & Magoulès, F. (2012). A review on the prediction of building energy consumption.
Renewable and Sustainable Energy Reviews, 16(6), 3586-3592. doi:10.1016/j.rser.2012.02.049
[49] Ahmad, A. S., Hassan, M. Y., Abdullah, M. P., Rahman, H. A., Hussin, F., Abdullah, H., &
Saidur, R. (2014). A review on applications of ANN and SVM for building electrical energy
consumption forecasting. Renewable and Sustainable Energy Reviews, 33, 102-109.
[50] Farzana, S., Liu, M., Baldwin, A., & Hossain, M. U. (2014). Multi-model prediction and
simulation of residential building energy in urban areas of Chongqing, South West China. Energy and
Buildings, 81, 161-169. doi:10.1016/j.enbuild.2014.06.007
[51] Marandi, F., & Fatemi Ghomi, S. (2016). Time series forecasting and analysis of municipal solid
waste generation in Tehran city. 2016 12th International Conference on Industrial Engineering
(ICIE). doi:10.1109/induseng.2016.7519343
[52] Kulisz, M., & Kujawska, J. (2020). Prediction of Municipal Waste Generation in Poland Using
Neural Network Modeling. Sustainability, 12(23), 10088.
[53] Abiodun, O., Jantan, A., Omolara, A., Dada, K., Mohamed, N., & Arshad, H. (2018, November
23). State-of-the-art in artificial neural network applications: A survey. Retrieved April 13, 2021,
from https://www.sciencedirect.com/science/article/pii/S2405844018332067
[54] Ghinea, C.; Drăgoi, E.N.; Comăni¸tă, E.-D.; Gavrilescu, M.; Câmpean, T.; Curteanu, S.;
Gavrilescu, M. (2016) Forecasting municipal solid waste generation using prognostic tools
and regression analysis. J. Environ. Manag., 182, 80–93
[55] Fu, H.Z.; Li, Z.S.; Wang, R.H, (2016). Estimating municipal solid waste generation by
different activities and various resident groups in five provinces of China. Waste Manag.
2015, 41, 3–[11] Grazhdani, D. Assessing the variables affecting on the rate of solid waste
generation and recycling: An empirical analysis in Prespa Park. Waste Manag., 48, 3–13
[56] Prades, M.; Gallardo, A.; Ibàñez, M.V. (2015) Factors determining waste generation in
Spanish towns and cities. Environ. Monit. Assess., 187, 4098.
[57] Mahees, M.T.M.; Sivayoganathan, C.; Basnayake, B.F.A. (2011) Consumption, Solid
Waste Generation and Water Pollution in Pinga Oya Catchment Area. Trop. Agric. Res., 22,
239–250
[58] Xu, L.; Lin, T.; Xu, Y.; Xiao, L.; Ye, Z.; Cui, S. (2016) Path analysis of factors influencing
household solid waste generation: A case study of Xiamen Island, China. J. Mater. Cycles
Waste Manag., 18, 377–384
[59] Abbasi, M.; El Hanandeh, A. (2016) Forecasting municipal solid waste generation using
artificial intelligence modelling approaches. Waste Manag., 56, 13–22
[60] Xu, L.; Gao, P.; Cui, S.; Liu, C. A. (2013) hybrid procedure for MSW generation
forecasting at multiple time scales in Xiamen City, China. Waste Manag., 33, 1324–1331.
[62] Abdoli, M.; Falahnezhad, M.; Behboudian, S., (2011). Multivariate econometric
approach for solid waste generation modeling: a case study of Mashhad, Iran. Environ. Eng.
Sci., 28(9): 627-633
[63] Kamil, M. (2004). The current state of quantitative research. Reading Research
Quarterly, 39, 100-107
[64] Adams, john. (1999) recommended methods for the disposal of waste management,
Oxford, 43-36
[65] Al-Jarrah, Omar, and Hani Abu-Qdais. (2006) "Municipal Solid Waste Landfill Siting using
Intelligent System." Waste Management 26.3: 299-306
[66] Barrett, Alan, and John Lawlor. (1995) "The Economics of Solid Waste Management in
Ireland” Economic and Social Research Institute, Dublin
[67] Cheremisinoff, Nicholas P. (2003). "Handbook of Solid Waste Management and Waste
Minimization Technologies". Butterworth-Heinemann,