Professional Documents
Culture Documents
Submitted By:
Sharthok Ghosh
Roll Number:160119
Registration Number: 101699
Session: 2015-16
Supervised By:
Md. Mahmudul Hasan
Assistant Professor, Department of Computer Science and Engineering
Pabna University of Science and Technology
January, 2022
DECLARATION
In accordance with rules and regulations of Pabna University of Sci-
ence and Technology following declarations are made:
I hereby declare that this thesis has been done by me under the su-
pervision of Md. Mahmudul Hasan, Assistant professor, Department
of Computer Science and Engineering, Pabna University of Science
and Technology, Pabna-6600. .
I am pleased to certify that Sharthok Ghosh, Roll No: 160119, Reg No:
101699, Session: 2015-16 performed a thesis work entitled “Regression
and Neural Network Based Prediction Model for the Participation of
Female Employment in Bangladesh” under my supervision for the re-
quirement of the completion of course entitled ‘Project/Thesis’. So far
as I concern this is an original thesis that has been carried out for one
year in the Department of Computer Science and Engineering, Pabna
University of Science and Technology, Pabna-6600, Bangladesh..
Assistant Professor,
Bangladesh.
ACKNOWLEDGEMENT
All praise for God who has created us and given a greatest status
among his all creations. First of all I express my gratefulness to the
Almighty God for enabling me to perform this task successfully. I
would like to express my deepest sense of gratitude to my honorable
supervisor Md. Mahmudul Hasan, Assistant Professor, Department
of Computer Science and Engineering (CSE), Pabna University of
Science Technology (PUST), for his scholastic supervision, valuable
guidance, adequate encouragement and helpful discussion throughout
the progress of this work. I am highly grateful to him for allowing me
to pursuing this study under his supervision.
January, 2022
Author
ABSTRACT
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 6
2.1 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Data Mining Methods and Techniques . . . . . . . . . . . . . . . 8
2.2.1 Techniques of Data Mining . . . . . . . . . . . . . . . . . . 8
2.2.2 Data Mining Tools . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Related Method . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3.1 ANOVA Test . . . . . . . . . . . . . . . . . . . . 10
2.2.3.2 Linear Regression Model . . . . . . . . . . . . . . 10
2.2.3.3 Residuals . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3.4 QQ plot . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3.5 Cross Validation . . . . . . . . . . . . . . . . . . 11
2.2.3.6 Gaussian Process . . . . . . . . . . . . . . . . . . 12
2.2.3.7 Random Forest . . . . . . . . . . . . . . . . . . . 12
v
CONTENTS
3 System Architecture 16
3.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Source of Data . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.4 ANOVA Test . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.5 Identifying Significant and Relevant Factors . . . . . . . . 18
3.1.6 Building Regression Models . . . . . . . . . . . . . . . . . 19
3.1.7 Performance Improvement . . . . . . . . . . . . . . . . . . 19
3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Implementation 21
4.1 Implementation Step . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
vi
CONTENTS
References 48
vii
List of Figures
viii
LIST OF FIGURES
ix
List of Tables
x
Chapter 1
Introduction
1
Chapter 1: Introduction
1.1 Overview
Female employment is a vital factor for the economic development of Bangladesh.
In many approaches, female employment can increase. In our study, we are
trying to find out the significant factor which is responsible for rising female
employment in Bangladesh. For this purpose, we use a stepwise linear regression
algorithm under the classifying technique. We also use k fold cross-validation
for performing our model performance. Other classification approaches such as
Gaussian Method, Random Forest, and Decision Tree are used to test the validity
of our model. We also use a neural network algorithm. For identifying factors,
we use the data from the “World Bank Development Indicator (WDI-2020)” for
Bangladesh for the period of 1991-2020.
1.2 Background
Female employment is an essential target of sustainable development goals. Sus-
tainable Development Goals were approved by the United Nations as a universal
call for ending poverty in 2015 and achieving peacetime and affluence for all by
2030. There are 17 Sustainable Development Goals, one among that is “Gender
Equality”. Ending all inequalities against women is not only an essential goal but
also crucial for sustainable development. It has been proven that female empow-
erment contributes to economic growth and development, and female employment
plays a significant role in advancing female empowerment [1].
The participation of employment is the number of the working population
aged 16 to 64 that are currently employed or looking for work in the economy.
Anybody who is still in studies, housewives, and people over the age of 64 are
excluded from the labor force. Participate in the workforce for both men and
women for a country’s economy to progress. However, in underdeveloped nations
2
Chapter 1: Introduction
such as Bangladesh, men are given greater weight in the workplace than women.
Females, on the other hand, families suffer numerous barriers to employment,
which has a negative influence on the country’s economy. Researchers have been
particularly interested in the female labor market in Bangladesh, as well as other
aspects of gender and development, in the previous two decades. In the context
of developing issues about gender inequality and its consequent adverse effects on
society and the economy, the issue of female contribution to the national econ-
omy has become a focus of discussion in Bangladesh as well as in most countries
[2]. Integrating the contribution of women has become essential for any economy
based on equity and efficiency. It is now widely recognized that female partic-
ipation in the labor market improves their relative economic position and also
stimulates the performance and improved ability of the economy from a broader
perspective. In Bangladesh, female contribution to the national economy is tons
lower due to low participation in the hard work marketplace [3]. While females
make significant contributions to off-market activities, such as household chores
and caring for children and the elderly at home, an important factor in ensur-
ing inclusive progress in the economy is ensuring women’s greater participation in
market-based industrious activities. And it’s not just economical skills that make
the female more involved in mainstream economical activities; this is important
for greater equity and also from a complete growth perspective [4].
In view of the above analysis, we attempted to discover female labor factors in
order to aid economic growth.
3
Chapter 1: Introduction
1.5 Summary
In this chapter, we discuss our thesis overview, background, objective, and
outcome. We give a short discussion about our thesis in this chapter. In an
overview section, we discuss the whole thesis work. In the background section,
we discuss our problem. In the objective section, we discuss the main objec-
tive of our study. In the outcome section, we discuss our thesis outcome. In the
4
Chapter 1: Introduction
next section, we will study our literature review. In the literature review chap-
ter we discuss data mining, related methods, and related work so that we can
understand our work.
5
Chapter 2
Literature Review
In this chapter, we discuss related studies related methods, and related work.
In related studies, we discuss what is Data Mining, Its advantage and disad-
vantage, methods we use in Data Mining to know which method we use in our
work. Tools we use in Data mining so that we can use these tools in our study.
In related methods, we know which method or technique we use in our work. in
the related methods section, we know about the ANOVA test so that we iden-
tify significant factors. ANOVA test helps us to identify which factor is signif-
icant for our study. In this section, we know about the linear regression model.
Linear regression model helps us to find a prediction model for female employ-
ment. Residual versus fitted graph help us to find if our model is linear or not.
QQ plot helps us to know if our model is too scattered with its sample data or
not. We also know about cross-validation, Gaussian Process, Random Forest,
Decision Tree to measure our model performance. In the related work section,
we discuss previous work-related with our work. In section 2.1 we discuss data
mining. In 2.2 we discuss the related method of our study. In 2.3 we discuss
our related work. In section 2.4 we discuss the summary of this chapter.
6
Chapter 2: Literature Review
7
Chapter 2: Literature Review
nesses in developing models based on historical data to predict who would re-
spond to new marketing initiatives such as direct mail and online marketing. It
provides a lot of advantages to retail businesses in the same way that advertis-
ing does[12].
Data Mining is used for several constructive purposes consisting of advertis-
ing/retail, finance/banking, manufacturing, and so on[13]. It is likewise utilized
by Governments for diverse purposes. But it has its hazards. It has questions
about privateer’s issues, safety issues, and misuse of statistics. When the net is
booming with social networks, e-trade, boards, blogs. . . . Because of privateer’s
problems, most people are afraid of their private records are gathered and used
in an unethical manner that causes them to several problem[28].
The data mining duties may be labeled normally into two types based on what
a particular task attempts to gain. Those two classes are descriptive responsi-
bilities and predictive tasks [14]. Predictive Task makes use of a few variables
to be expecting unknown or destiny values of other variables. It may determine
what might show up in the future. Descriptive assignments locate human inter-
pretable styles that describe the records. It describes what passed off past [15].
The techniques of data mining are-
• Association
• Classification
• Decision Tree
• Clustering
• Prediction
8
Chapter 2: Literature Review
Data mining tools assist us in doing brief analyses. It takes the pain of com-
manding any well-known algorithm from scratch however at an equal time gives
us the power to adjust the code of the device as consistent with necessities. All
the tools mentioned below have their peculiarity in terms of implementation
and each has its own merits[16]. The tools are-
• Rapid Miner
• WEKA
•R
• Teradata
• Python
• Orange
• Kaggle etc.
For finding the related factor and building a prediction model we use the mul-
tiple stepwise linear regression method. We find the significant factor through
ANOVA Test and build co-relation among the factor. For finding performance
measurement we use k-fold cross-validation and finding Mean Absolute Error
(MAE), Mean Relative Error (MRE), and Root Mean Square Error (RMSE).
Apart from that, we test the performance of our model using various classifi-
cation techniques such as the Gaussian Method, Random Forest, and Decision
Table. For a more realistic model, we employ a neural network technique.
9
Chapter 2: Literature Review
An ANOVA test is a way to find out if analysis or test effects are significant. In
other words, they assist you in finding out in case you want to reject the null
hypothesis or receive the alternate hypothesis[17].
Types of analysis of variance:
Analysis of variance is of two types. One-way ANOVA and two-way ANOVA.
One-way or two-way refers to the number of independent variables (IVs) in
your Analysis of Variance test.
• One-way has one independent variable.
• Two-way has two independent variables (it can have multiple levels).
Linear regression efforts to model the connection among two variables by the
way of fitting a linear equation to experimental data. One variable is consid-
ered to be an instructive variable, and the other is taken into consideration to
be a dependent variable. Before trying to fit a linear form to determine records,
a modeler should to first decide whether or not there is a connection between
the variables of interest. If there appear to be no association among the pro-
posed explanatory and structured variables then fitting a linear regression model
to the information probable will no longer offer a useful form [18].
2.2.3.3 Residuals
10
Chapter 2: Literature Review
of the variables.
2.2.3.4 QQ plot
11
Chapter 2: Literature Review
III Fit a model on the training set and evaluate it on the test set.
IV Retain the evaluation score and discard the model.
4. Summarize the skill of the model using the sample of model evaluation scores.
Random forests are a collective learning approach for class, regression, and dif-
ferent responsibilities that function by building an assembly of decision trees at
training time and outputting the magnificence that is the type of the lessons
(category) or suggests prediction (regression) of the individual trees. Random
decision forests accurately for decision trees habit of over becoming to their
training set [21].
12
Chapter 2: Literature Review
13
Chapter 2: Literature Review
the social sciences, was employed to complete this project (ISSP 2010). Except
quota sampling, the nations in the ISSP used a variety of sample strategies,
including (multi-) staged sampling, clustered sampling, and probability sam-
pling.Apply hypothesis method to proceed the work.
In 2014 Angela Cipollone, Eleonora Patacchini Giovanna Valenti had published
a paper on female labor market participation in Europe. This paper had been
used in 20 years using individual data from 15 countries. This research showed
that the observed trends in women’s participation differ significantly across
both countries and in different groups of women. We explore such differences
in trends by looking at the impact of policy and institutional factors in the la-
bor market on the participation of women in different households. Labor mar-
ket organizations and family-oriented policies account for about 25% of the ac-
tual increase in labor force participation for young women and more than 30%
for highly educated women, and surprisingly, changes in institutional and pol-
icy settings contributed less to explain the participation of low-skilled women
[24]. They created a unique dataset of similar household and individual level
characteristics across countries and over time by combining microdata from two
separate sources: the ECHP (European Community Household Panel) and the
EU-SILC (European Union Statistics on Income and Living Conditions). The
ECHP microdata is a household survey with a standard framework that is car-
ried out across the EU-15 Member States under Eurostat’s supervision. The
ECHP is an eight-year program that runs from 1994 to 2019.
In 2015 Eleni T. Stavrou, Wendy J. Casper, and Christiana Ierodiakonou had
published an article on the support for female employment and gender empow-
erment of labor market conditions. This article examines the characteristics of
both the organizational environment and the variable organizational level of
women’s employment using a multi-source data set collected across eight Euro-
14
Chapter 2: Literature Review
pean countries. It also found that organizations that support part-time work
options are more likely to employ women. One reason for this may be that
offering part-time employment in high-GEM countries is a way to signal sup-
port for an organization’s work-life balance, which makes it more attractive to
women [25]. A. Cipollone, E. Patacchini, and G. Valenti, “Female labor market
participation in Europe: novel evidence on trends and shaping factors.
2.4 Summary
This chapter is the base chapter of this study. Discuss data mining , related
methods, and related work are described here. From this chapter, we have a
vast idea about data mining, its advantage disadvantage, methods of data min-
ing, tools of data mining, and many other things. We also know about ANOVA
test, regression model, q q plot, Gaussian Process, Random forest, Decision Ta-
ble, and so on. In the next chapter, we will go to discuss our system architec-
ture model. The system architecture model describes in which way we solve our
problem.
15
Chapter 3
System Architecture
In the previous chapter, we discuss data mining, related work, and related meth-
ods. Now in this chapter, we discuss a basic model of our proposed system. In
section 3.1 we discuss the proposed system architecture. First, we study some
related papers with our work. Then we select a data set. We use the world de-
velopment indicator (WDI-2020) for our work. Then we prepare our data. In
the next step, we apply ANOVA Test to identify significant factors. Then we
build a regression-based model. Then we apply various performance measure-
ment criteria in our model. In section 3.1.1 we discuss Literature Review; In
section 3.1.2 we discuss Source of Data; in section 3.1.3 we discuss Data prepa-
ration; In section 3.1.4 we discuss ANOVA Test; In section 3.1.5 we discuss
Identify significant and relevant model; in section 3.1.6 we discuss building re-
gression model; in section 3.1.7 we discuss Performance measurement criteria.
16
Chapter 3: System Architecture
After reading some related work we have to find out a reliable data source for
our work. For this study, we use the data extracted from the “World Devel-
17
Chapter 3: System Architecture
When we have a reliable data set next step is data preparation. We use the
data for this study includes up to 61 years from 1960 to 2020. Not all indica-
tors have all values among these years. So we find out some indicator which is
based on the number of data available in the data set. That indicator brings
out the data for the period of 1991-2020.
After practicing with the records, we ran an ANOVA test on the data. After
filtering such indications, we investigate the significant factors of the suicide
mortality rate. We use an F-statistic to look at the results of Analysis of Vari-
ance (ANOVA). For this significance check, we choose a p-value of less than
0.05. Following the ANOVA test, we get a list of significant and suitable factors
to use in the model construction. We’ll figure out which factor is important for
this study using the ANOVA test.
We obtain some significant and relevant factors for the labor market of the fe-
male through the ANOVA test. In the ANOVA test, we apply linear regression
to identify significant and relevant factors. For this research, we use Perfor-
mance Measurement, Literature Review, and Collection of data set. We find
some factors that are not in the literature; so we ignore them. We identify the
18
Chapter 3: System Architecture
factors that are significant but not relevant. We also find out the relevant fac-
tors through the literature review. All the significant factors are not relevant so
we ignore them. Finally, we find out the factors which are significant and rele-
vant factors of the labor market of females.
After finding significant factors we use the linear regression algorithm of clas-
sification method to build the model. Classification is a process of data anal-
ysis and building models. For classifying a set of data into one set of prede-
fined classes it is used. A regression algorithm is used for predicting outcomes
based on the independent variable. Step-wise linear regression is used to check
whether the whole model together is significant or not. For this reason, we get
a large number of models. Then we analyze which model has higher accuracy
by considering some predefined performance measurement criteria [20]. A re-
gression equation is shown in equation (1).
19
Chapter 3: System Architecture
In the above equations, ya denote the actual value from the data set, y(a ) is
the average of actual value and y(b ) is the predicted value generated from the
model. Moreover, we observe the model R-squared and Adjusted R-squared
values for the accuracy of the model. We also apply WEKA through linear re-
gression to find our measurement criteria MAE, RAE, and RMSE. So that we
can easily compare those values what we get from the equation and what we
get from WEKA.
3.2 Summary
In this chapter, we discuss various steps of our proposed system model. At
first, we choose the data set then prepare data and apply the ANOVA test of
these data and find significant factors. Then with significance, we build a pre-
diction model and apply various performance measurements. In the next chap-
ter, we will discuss the implementation chapter.
20
Chapter 4
Implementation
In the previous chapter, we discussed our proposed system model and now in
this chapter, we discuss implementation. This is the most important chapter.
There are some implementation steps. At first, we have to prepare our data.
Then we apply analysis of variance into data and identify significant factors.
Then we build a regression model. For finding a better model we apply a neural
network through our model and find more accuracy. In section 4.1 we discuss
the implementation step; in section 4.1.1 we discuss data preparation; in section
4.1.2 we discuss identifying significant factors; in section 4.1.3 we discuss the
Analysis of variance; in section 4.1.4 we discuss the building model. In the next
chapter, we will discuss the result of our study and improve the performance of
our study.
21
Chapter 4: Implementation
22
Chapter 4: Implementation
We avoid those indicators which have a small amount of data and have less sig-
nificance in building significant models. After data preparation, we get 438 in-
dicators where data are available for the period 1991-2020. After significant
analysis, we got 32 indicators. From the dataset for the period of 1991-2020,
we apply the ANOVA test to identify significant and relevant factors. After
the significant testing with a p-value ¡ 0.05, we identified some factors that do
not exist in literature but have significance on female employment Based on
the method we follow the proposed system, we remove the significant but ir-
relevant factors identified from the dataset. The remaining significant factors
from the set are added to the list of factors identified in the literature. We got
six factors that are both significant and relevant. From table 1 we see that af-
ter the ANOVA test we find six significant and relevant factor that is related
to our study. The factors are self-employed, the industry employed, vulnerable
employed, agriculture employed, employers, and service employed. We apply a
linear regression algorithm among these six factors. From the linear regression
model we see that four factors are co-related with each other and we can build
a prediction model among those four-factor. Table 2 describes our final model
factor.
Factor name Adjusted R-squared Multiple R-squared P-value Star
Self
0.7767 0.7844 7.853e-11 ***
Employed
Industry Employed 0.9582 0.9597 <2.2e-16 ***
Vulnerable Employed 0.7824 0.7899 5.456e-11 ***
Agriculture Employed 0.8741 0.8784 2.444e-14 ***
Employers 0.8266 0.8326 2.214e-12 ***
Service Employed 0.7705 0.7784 1.156e-10 ***
Analysis of Variance
Using R-Studio IDE we calculate analysis of variance of our data set. The sam-
23
Chapter 4: Implementation
24
Chapter 4: Implementation
25
Chapter 4: Implementation
Figure 4.3 is histogram of self employed. Its highest frequency is 14 and lowest
frequency is 1.
26
Chapter 4: Implementation
No. factors Estimate Std. error T value Pr(>—t—) Sign star Multiple R-squared Adjusted R-squared
1 Intercept 20.68140 4.63235 4.465 0.000149 ***
Vulnerable
2 -7.22920 1.48063 -4.883 5.05e-05 *** 0.9827 0.9799
employed
Self
3 7.30043 1.51565 4.817 5.99e-05 ***
employed
Agriculture
4 -0.10162 0.03286 -3.093 0.004829 **
employed
Industry
5 0.36365 0.11853 3.068 0.005125 **
employed
27
Chapter 4: Implementation
4.2 Summary
In this chapter, we discuss the implementation of our work. At first, we prepare
our data set then identify factors by applying the ANOVA test and build a pre-
diction model with those factors. In the next chapter we will discuss the result
of our work and performance improvement.
28
Chapter 5
29
Chapter 5: Result and Discussion
5.1 Result
A residual is a difference between the observed value of the dependent variable
(y) and the predicted value (ŷ). This plot tests the assumptions of whether the
relationship between your variables is linear (i.e. linearity) and whether there
is equal variance along the regression line. The residuals versus fitted plots are
about residual on the y axis and fitted values on the y-axis. We use this plot to
detect linearity. The residual vs fitted graph below describes how the model is
more linearly appropriate. A good residual vs. fitted plot should be a straight
line and have no outliers and it is distributed around the 0 lines without partic-
ularly large residuals. If we find the line around a horizontal line without any
outliers then it indicates the linear relationship between dependent and inde-
pendent variable otherwise it is a non-linear relationship.
Figure 5.1 shows us the Residual versus fitted graph of our model. From the
figure, we see that our model is not linear with input factors and output fac-
tors.
A q-q plot is a plot of the quantiles of the first data set against the quantiles of
30
Chapter 5: Result and Discussion
the second data set. A 45-degree reference line is also plotted. If the two sets
come from a population with the same distribution, the points should fall ap-
proximately along this reference line. The greater the departure from this ref-
erence line, the greater the evidence for the conclusion that the two data sets
have come from populations with different distributions.
Figure 5.2 shows the Q Q plot of our model. From figure 5.2 we see that al-
though our model is near to linear, it is not also too scattered. From this, we
can assume that the sample mean has a normal distribution. .
31
Chapter 5: Result and Discussion
32
Chapter 5: Result and Discussion
A boxplot is a graph that provides you a good sign of exactly how the values in
the data are spread out. Although boxplots may seem primitive in comparison
to a histogram or density plot, they have the advantage of taking up less space,
which is useful when comparing distributions between many groups or datasets.
Error boxplot is given below. The boxplot of MAE is given below. Figure 5.3
shows the boxplot of MAE. Figure 5.4 shows the boxplot of MRE. Figure 5.5
shows the boxplot of RMSE.
Figure 5.3 shows us the boxplot of MAE which lowest error is -0.1 and highest
33
Chapter 5: Result and Discussion
error is 0.2.
Figure 5.4 shows us the boxplot of MRE which lowest error is above 0.012 and
highest error is below 0.016.
Figure 5.5 shows the boxplot of RMSE which lowest error started from 0.075
and highest error started from below 0.23.
34
Chapter 5: Result and Discussion
Figure 5.6 shows the relationship between the actual participation and pre-
dicted participation of female employment. We can conclude from this graph
that the real and anticipated values are nearly identical.
35
Chapter 5: Result and Discussion
the blue line displays the bias term. Our model has four input neurons, three
hidden neurons, and one output neuron. Our neural network model has less
error which is 0.02359.
Figure 5.8 shows the scatterplot of the neural network model which compares
the predicted output with the real output.
36
Chapter 5: Result and Discussion
From this figure 5.8, we can say that the actual and neural predicted output is
linearly promoting.
The 5 Fold Cross-Validation Results with Neural Network are shown in Table
5.4. The resultant error is significantly small after using the neural network
algorithm.
37
Chapter 5: Result and Discussion
Figure 5.9 shows the relationship between the actual participation and neural
predicted participation of female employment using full training set data. We
can tell from this figure that the actual and neural predicted value is about the
same.
We’ll now apply 80% of the total data set as training data and 20% as testing
data to try to generate a graphical view of the relationship between them.
38
Chapter 5: Result and Discussion
After 80 percent of the training data set has been applied, the predicted val-
ues are shown with the actual values in figure 5.10. The performance improves
slightly when the 80 percent training data set is used instead of the full train-
ing data set.
Figure 5.11: Line Chart of Actual VS Neural Predicted Participation (For 20%
Testing Data Set)
After applying a 20% testing data set to figure 5.11, the predicted values are
moving away from the real values.
We have used 80 percent of the training data set and 20% of the test data set
in the two figures above 5.10 and 5.11. However, we can say that using 80% of
the training data set is capable of improving performance over using the entire
training data set. And it is moving away from the actual value after applying
20% of the test data set, meaning that its performance is decreasing as a result
of applying the testing data set.
39
Chapter 5: Result and Discussion
5.3 Discussion
According to the model, the variables such as self-employed and industry em-
ployed have a significantly positive effect on female work participation whereas
vulnerable employed and agriculture employed to harm female work participa-
tion. The contribution of the female labor force is found tremendously supe-
rior in comparison with the total labor force. We were found three regression
models in which base model is best. Other two models we were found consist
of four factors each of which vulnerable employed, self-employed, service em-
ployed and the industry employed. And another model has service employed,
self-employed, agriculture employed, and vulnerable employed factors. From the
three models, we can see that vulnerable employed and self-employed are com-
mon. So we analyze the models and our datasets to find any effects or relation
of self-employed and vulnerable employed do have in our models.
Figure 5.12 shows the graphical illustration of the actual, linear regression model
and neural network model algorithm. From this figure, we can say that the
40
Chapter 5: Result and Discussion
neural network model algorithm is giving better results than the linear regres-
sion model. We aimed to increase the performance of our model using neural
network model algorithms, which we were able to do at least somewhat.
5.4 Summary
In this chapter, we discuss residual versus fitted graph, QQ plot, and k fold
cross-validation, performance measurement with other classifying techniques,
boxplot of MAE, MRE, and RMSE, performance improvement, and so on. In
the next chapter, we will discuss the limitation and future work of our study.
41
Chapter 6
In the previous chapter, we discussed our results and in this chapter, we will try
to find out our limitations and future work. We try to create a system that has
fewer errors as possible. But we have some limitations in this system. We can
solve this problem with the help of another system, these are described in this
chapter. In section 6.1 we discuss the limitation of our system, in section 6.2
we discuss future works to solve our limitations.
42
Chapter 6: Limitations and Future Works
6.1 Limitations
The limitations refer to the design or procedures that influenced the interpreta-
tion of our research findings. Limitation can be a valuable tool for identifying
new gaps in the literature and indicating the need for additional studies.
To start with, the sample size we used was insufficient. Finding significant asso-
ciations from our data will be tricky. Because the sample size is so small, find-
ing a pattern and a meaningful association is difficult.
Secondly, although we have probable factors we don’t include them due to a
lack of data set.
Thirdly, because of the lack of prior research or studies on Bangladesh, it may
be required to develop an entirely new research typology which seems to be
very difficult.
Fourthly, other methods (association, clustering, and decision tree) maybe also
fitted for this research but that is not checked in this implementation.
Fifthly, if the dataset is big then more folds can be shaped as a result the cross-
validation model becomes more exact.
Sixthly, all the fractional points are not taken into consideration and also we
take the fractional as approximate value, thus the error rate may slightly in-
crease.
Firstly, we remove low significant indicators for that reason we may not acquire
our preferred result that‘s why in the future we will try to analyze for all values
low or high significant stages.
43
Chapter 6: Limitations and Future Works
Secondly, we will try to find the model that will be fit for both linear and non-
linearly.
Thirdly, the cross-validation model should be more perfect in the future study.
Fourthly, we can usage other algorithm techniques to get a superior model fur-
ther.
6.3 Summary
In this chapter, we tried to highlight the limitations of our work and our main
limitations the lack of data around possible factors. And we have tried to give
the right steps to overcome these limitations. Various approaches including As-
sociation, Clustering, and Decision trees must be used. We prerequisite to ex-
pand the accuracy of cross-validation results.
44
References
[4] V. Motkuri, “Caste and rural youth in india: Education, skills and employ-
ment,” 2013.
45
REFERENCES
[8] D. J. Hand, “Principles of data mining,” Drug safety, vol. 30, no. 7,
pp. 621–622, 2007.
[9] F. Gorunescu, Data Mining: Concepts, models and techniques, vol. 12.
Springer Science & Business Media, 2011.
[10] R. P, “https://bigdata-madesimple.com/14-useful-applications-of-data-
mining/,” Aug 20 2014 [Online].
[11] D. Enke and S. Thawornwong, “The use of data mining and neural net-
works for forecasting stock market returns,” Expert Systems with applica-
tions, vol. 29, no. 4, pp. 927–940, 2005.
[13] H. W. Ian and F. Eibe, “Data mining: Practical machine learning tools
and techniques,” 2005.
46
REFERENCES
[22] A. Luci, “Female labour market participation and economic growth,” In-
ternational Journal of Innovation and Sustainable Development, vol. 4,
no. 2-3, pp. 97–108, 2009.
47
REFERENCES
tional Journal of Human Resource Management, vol. 26, no. 6, pp. 688–
706, 2015.
[27] H. Lu, R. Setiono, and H. Liu, “Effective data mining using neural net-
works,” IEEE transactions on knowledge and data engineering, vol. 8,
no. 6, pp. 957–961, 1996.
48