You are on page 1of 20

The Narrowing Male-Female Unemployment Differential

1. INTRODUCTION

2. PRESENTATION OF PREPARED DATA

3. ANALYSIS & INTERPRETATION

4. GENERAL DISCUSSION

5. CONCLUTION

`
19

1. Introduction
The Narrowing Male-Female Unemployment Differential

Unemployment rates of developed and developing countries pose a complicated puzzle. Most
developed countries have higher unemployment rates than developing countries.

Unemployment (or joblessness) occurs when people are without work and actively seeking
work. The unemployment rate is a measure of the prevalence of unemployment and it is calculated as a
percentage by dividing the number of unemployed individuals by all individuals currently in thelabor
force. During periods of recession, an economy usually experiences a relatively high unemployment
rate. According to International Labour Organization report, more than 197 million people globally are
out of work or 6% of the world's workforce were without a job in 2012.
As gender roles have followed the formation of agricultural and then industrial societies, newly
developed professions and fields of occupation have been frequently inflected by gender. Some
examples of the ways in which gender affects a field include:

Prohibitions or restrictions on members of a particular gender entering a field or studying a field;


Discrimination within a field, including wage, management, and prestige hierarchies;
Expectation that mothers, rather than fathers, should be the primary childcare providers.

Note that these gender restrictions may not be universal in time and place, and that they operate to
restrict both men and women. However, in practice, norms and laws have historically restricted
women's access to particular occupations; civil rights laws and cases have thus primarily focused on
equal access to and participation by woman in the workforce. These barriers may also be manifested in
hidden bias and by means of many micro inequities.

Women in the workforce earning wages or a salary are part of a modern phenomenon, one that
developed at the same time as the growth of paid employment for men; yet women have been
challenged by inequality in the workforce. Until modern times, legal and cultural practices, combined
with the inertia of longstanding religious and educational conventions, restricted women's entry and
3

participation in the workforce. Economic dependency upon men, and consequently the poor socioeconomic status of women, have had the same impact, particularly as occupations have become
professionalized over the 19th and 20th centuries.

The main objective of the study is to, identify how the unemployment rate of men willaffect the
unemployment rate of female workers.
In addition, by summarizing the data using descriptive statistics methods and presenting the main
features graphically, we can also compare how each variable affects the other.

2. PRSENTATION OF INFORMATION
The data given is a sample taken from a labour force survey conducted by a research company in Sri
Lanka. The survey was carried out as a household survey and all the members of a randomly selected
household, in the working age population (i.e. age 15) were
considered in the survey.
During the 1975-2010 period the unemployment rate for women
was higher than the rate for men in every year but one. In recent
years, however, a dramatic and unanticipated narrowing of the
male-female unemployment rate differential has occurred. In
2007 and 2008 the female rate was less than the male rate. And
since 2009 the female rate has exceeded the male rate by
historically small amount. The relatively high female
unemployment rate has being taken as evidence of the
disadvantages women face in the job market, or of their
relatively weak attachment to the labour force. Since the
narrowing of the male-female rate differential could indicate a
change in these underlying factors, a new examination of malefemale unemployment differential seems appropriate.

Year

Male UE Female UE
1975
15.1
15.7
1976
12.8
14.4
1977
12.8
13.6
1978
12.8
13.3
1979
15.3
16
1980
14.2
14.9
1981
13.8
14.8
1982
14.1
14.7
1983
16.8
16.8
1984
15.2
15.9
1985
15.4
15.9
1986
16.4
17.2
1987
15.2
16.2
1988
15.2
16.5
1989
14.6
16.2
1990
14
15.5
1991
13.2
14.8
1992
13.1
15.2
1993
12.9
14.8
1994
12.8
14.7
1995
14.4
15.9
1996
15.3
16.9
1997
15
16.6
1998
14.2
16
1999
14.9
16.7
2000
17.9
19.3
2001
17.1
18.6
2002
16.3
18.2
2003
15.3
17.2
2004
15.1
16.8
2005
16.9
17.4
2006
17.4
17.9
2007
19.9
19.4
2008
19.9
19.2
2009
17.4
17.6
2010
17
17.4
5

3. METHODOLOGY
Statistics is a field of mathematics that pertains to data analysis. Statistical methods and equations can
be applied to a data set in order to analyze and interpret results, explain variations in the data, or predict
future data. A few examples of statistical information we can calculate are:

Average value (mean)

Most frequently occurring value (mode)

On average, how much each measurement deviates from the mean

Span of values over which your data set occurs (range), and

Midpoint between the lowest and highest value of the set (median)

Mean
The mean, is obtained by dividing the sum of observed values by the number of observations, n.
Although data points fall above, below, or on the mean, it can be considered a good estimate for
predicting subsequent data points. The formula for the mean is given below as equation,

Median
The median is the middle value of a set of data containing an odd number of values, or the average of
the two middle values of a set of data with an even number of values. The median is especially helpful
when separating data into two equal sized bins.

Standard Deviation
The standard deviation gives an idea of how close the entire set of data is to the average value. Data
sets with a small standard deviation have tightly grouped, precise data. Data sets with large standard
deviations have data spread out over a wide range of values.

Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out.
It is one of several descriptors of a probability distribution, describing how far the numbers lie from
the mean (expected value).

Skewness
Skewness is the degree of departure from symmetry of a
distribution. A positively skewed distribution has a "tail" which is
pulled in the positive direction. A negatively skewed distribution
has a "tail" which is pulled in the negative direction.
Since we have a large sample, it appropriate to use Fisher-Pearson statistics to calculate Skewness.

Kurtosis
Kurtosis is the degree of peakedness of a distribution.

A normal distribution is a mesokurtic distribution.


A pure leptokurtic distribution has a higher peak than the
normal distribution and has heavier tails.
A pure platykurtic distribution has a lower peak than a
normal distribution and lighter tails.

Most departures from normality display combinations of both skewness and kurtosis different from a
normal distribution.
Correlation
When two sets of data are strongly linked together we say they have a High Correlation.

Correlation is Positive when the values increase together, and


Correlation is Negative when one value decreases as the other increases

Like this:

Correlation can have a value:

1 is a perfect positive correlation


0 is no correlation (the values don't seem linked at all)
-1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or
negative.

Time Series Analysis


A time series is simply a series of data through time. The data can be of different types: continuous
(e.g. temperature), binary (e.g.presence-absence of a species) or nominal (e.g. three different rock types
through a section). The data are usually univariate, but methods have also been devised for multivariate
time series analysis. Many methods require that the data are evenly spaced along the time line, and
unevenly spaced data will then have to be interpolated before analysis. In principle, there is nothing
special about a time series from a mathematical point of view it is simply a function of a single
variable (time). But in practice, time series analysis involves a particular set of problems and methods,
and time series analysis is therefore a distinct field within statistics and engineering. Typical questions
asked in time series analysis are:

Are there periodicities in the data, maybe controlled by daily or annual cycles?
Is there a trend?
Are two time series (e.g. global temperature and CO levels) correlated? If so, is there a delay
between the two?

A variety of methods have been invented to investigate such problems. Although I will endeavor to
make things simple, it cannot be denied that time series analysis is a rather complicated and technical
field, with many pitfalls and subtleties. The case studies will demonstrate some of these issues.

Regression
Linear regression uses one independent variable to explain and/or predict the outcome of Y.
The general form of linear regression is:

Linear Regression: Y = m + bX

Where:
Y= the variable that we are trying to predict
X= the variable that we are using to predict Y
b= the intercept
m= the slope

Regression takes a group of random variables, thought to be predicting Y, and tries to find a
mathematical relationship between them. This relationship is typically in the form of a straight line
(linear regression) that best approximates all the individual data points. Regression is often used to
determine how much specific factors such as the price of a commodity, interest rates, particular
industries or sectors influence the price movement of an asset.

When we choose to analyze your data using linear regression, part of the process involves checking to
make sure that the data you want to analyze can actually be analyzed using linear regression. We need
to do this because it is only appropriate to use linear regression if our data "passes" six assumptions that
are required for linear regression to give a valid result.

Assumption #1: Two variables should be measured at the interval or ratio level (i.e., they
are continuous). Examples of variables that meet this criterion include revision time (measured
in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100),
weight (measured in kg), and so forth.

Assumption #2: There needs to be a linear relationship between the two variables. Whilst there
are a number of ways to check whether a linear relationship exists between the two
variables.Scatterplot can plot the dependent variable against the independent variable, and then
visually inspect the scatterplot to check for linearity. Scatterplot may look something like one of
the following:

Assumption #3: There should be no significant outliers. Outliers are simply single data points
within data that do not follow the usual pattern. The following scatterplots highlight the
potential impact of outliers:

The problem with outliers is that they can have a negative effect on the regression equation that
is used to predict the value of the dependent (outcome) variable based on the independent
(predictor) variable. This will reduce the predictive accuracy of r results.
10

Assumption #4: Should have independence of observations, which you can easily check using
the Durbin-Watson statistic.

Assumption #5: Data needs to show homoscedasticity, which is where the variances along the
line of best fit remain similar as you move along the line.

Assumption #6: Finally, we need to check that the residuals (errors) of two variables
are approximately normally distributed.

11

4. ANALYSIS & INTERPRETATION


Calculation of summary statistics for the Male unemployment data set as follows,
(

Above equation is more appropriate for a small data set where number of observations (n) 10, but our
sample size is 36 Fisher-Pearson equation would be more meaningful.

)(

(
(

)
)

(
(

)
(

)(

(
)(

(
)(

)
)

)
)

12

Calculation of summary statistics for the Female unemployment data set as follows,

Female unemployment data set is bi-modal with values 14.8 and 15.9, therefore we calculate Skewness
for both.
(

)(

)
)

)(

)
(

)
(

(
(

)(

(
)(

(
)

)
)

13

Given below is the descriptive statistics computed for the data set,

Descriptive Statistics
N

Minimum Maximum

Statistic
Male
Unemployment
Femal
Unemployment

Statistic

Statistic

Mean

Std.
Variance
Deviation

Statistic

Statistic

Skewness

Kurtosis

Statistic Statistic Std. Statistic


Error

Std.
Error

36

12.80

19.90 15.2694

1.85547

3.443

.734

.393

.380

.768

36

13.30

19.40 16.3389

1.52994

2.341

.209

.393

-.330

.768

Table 1

Mean unemployment rate is

higher
for the female workers.

Male unemployment has a

positive
kurtosis and indicates that its
leptokurtic distribution has a higher
peak than the normal distribution
and has heavier tails.

Female unemployment has a negative kurtosis and indicates that its pure platykurtic distribution
has a lower peak than a normal distribution and lighter tails.

From the Box & whiskers plot we can see that Female unemployment is symmetrically
distributed while male unemployment is skewed.

In both distributions median is less than the mean, indicating they are positively skewed.
14

Scatter Plot for the date set.

Figure 1

Scatter plot (Figure 1) shows how unemployment rate for male and female changes over the time
period and it also shows that both rates move similarly, indicating positive correlations.

Validating assumptions
Assumption #1: Both data sets are percentage values and thus measured at ratio level.
Assumption #2:Above scatter plot positively correlated, thus proving there exist a linear relationship
between two variables.
Assumption #3:No significant outliers are to found by examining the scatter plot (Figure 1)
Assumption #4:Independence of observation is satisfied since the sampling of one person does not
affect the outcome of the second person, thus to confirm Durbin-Watson (0.361) is not in between the
upper or lower margin.

15

Assumption #5:Homoscedasticity the


assumption that the variability of data around
the regression line be constant for all values of
X. In other words, error must be independent
of X. Generally, this assumption may be
tested by plotting the X or Y values against
the raw residuals for Y.
Notice how there is no 'fanning' pattern to the
data, implying homoscedasticity

Figure 2

Assumption #6:Normality of Error, this assumption is often tested by simply plotting the Standardized
Residuals (each residual divided by its standard error) on a histogram with a superimposed normal
distribution.

Figure 3

From the histogram and the normal p-p plot we can see that the residuals follow a normal distribution
and hence we can conclude that the data set meets all the six assumptions that was examined and a
regression analysis can be carried out on the data set.
16

A manual regression calculation can be done as follows,

Xsum - The sum of all the values in the x column (Male UE).
Ysum - The sum of all the values in the y column (Female UE).
XYsum - The sum of the products of the xn and yn that are recorded at the same time (vertical on this
chart).
X2sum - The total of each value in the x column squared and then added together.
Y2sum - The total of each value in the y column squared and then added together.
N - The total number of elements (or trials in your experiment).

The best form for our line is slope-intercept form, which looks like y = mx + b. Therefore, it is only
necessary to compute m and b to determine the best fit line. Those values can be computed by the
following equations:

After plugging in the values that we found, we get: m = 0.767 and b = 4.632.
Since we have a large data set, SPSS was used to analyze and fit the regression line to the data set.

Table 2

R
R Square
a
.930
.865

Model Summary
Adjusted R Square
.861

Std. Error of the Estimate Durbin-Watson


.57130
.361

This table provides the R and R2 value. The R value is 0.930, which represents the simple correlation. It
indicates a high degree of correlation. The R2 value indicates how much of the dependent variable,
"female unemployment", can be explained by the independent variable, "male unemployment". In this
case, 86.5% can be explained, which is very large.

17

Table 3

Model
1

Regression
Residual
Total

ANOVA
Sum of Squares
70.828
11.097
81.926

df

Mean Square
F
Sig.
1
70.828 217.010 .000b
34
.326
35

Next we look table is the ANOVA table. This table indicates that the regression model predicts the
outcome variable significantly well. We say this because Sig. column value for regression is 0.00, this
indicates the statistical significance of the regression model that was applied. Here, p < 0.0005, which
is less than 0.05, and indicates that, overall, the model applied can statistically significantly predict the
outcome variable.

Table 4

Coefficients
Unstandardized Coefficients

Model

B
1

(Constant)
Male
Unemployment

4.632
.767

Std. Error
.800
.052

Standardized
Coefficients

Sig.

Beta
.930

5.787
14.731

.000
.000

The table above, Coefficients, provides us with information on each predictor variable. This gives us
the information we need to predict female-unemployment from male-unemployment. We can see that
both the constant and male-unemployment contribute significantly to the model (by looking at
the Sig. column). By looking at the B column under the Unstandardized Coefficients column, we can present
the regression equation as:

Female-unemployment = 4.632 + 0.767(Male-unemployment)

18

Trend Line Graph Of Unemployment

25

y = 0.1094x + 14.316
R = 0.5672

15

y = 0.1103x + 13.229
R = 0.3923

10

Male

Female

Year

Linear (Male )

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

1980

1979

1978

1977

1976

0
1975

Unemployment

20

Linear (Female)

A linear trend line is a best-fit straight line that is used with simple linear data sets. Your data is linear if the pattern in its data points
resembles a line. A linear trend line usually shows that something is increasing or decreasing at a steady rate.
In the data set, a linear trend line clearly shows that both male and female unemployment moves consistently over years, we can
see that the R-squared (fraction of variance explained by a model) value is not significant, which is not a good fit of the line to the
data.
This might have occurred because we have used annual data, any seasonal and cyclical effects might be masked due to this, a better
line of fit might have being possible with monthly or quarterly data.
Further time series modeling is not possible or rather we could say not meaningful as seasonal decomposition cannot be carried out
on annual data, we say a seasonal component is included because of R-squared is not significant.

19

5. GENERAL DISCUSSION
From our results we can conclude the following,

According to our data set we can see that year by year the gap between male-female unemployment
declines and they move in a similar pattern. And can be used to predict one another.

Our preferred explanations focus on the restrictions on the set ofavailable jobs that are acceptable to
women, mainly due to thepresence of young children that create frictions to their employment.
When mothers return to work after childbirth, they have to searchthe set of available vacancies, which
takes time and effort. But manyfirms have increased workplace assistance that helps mothers ofyoung
children return to the previous firm in typical work, and theseoffers are immediately apparent without
the need for job search. Soreturning mothers, on average, now face fewer frictions in findingwork after
childbirth. There is also evidence that new jobs taken bywomen are increasingly likely to continue into
a second year, whichwould also lower the inflow rate into unemployment. These pieces ofevidence
may be consistent with a lowering of the natural rate offemale unemployment, although of course that
is only oneinterpretation.

In order to get a better view how the two variables behave we can do a further analysis by obtaining
monthly or quarterly data, and considering other factors that affect employment, such as education
level, health, inflation.

20