Professional Documents
Culture Documents
ITLS5050 Data Set 2 Workers Vs Production
ITLS5050 Data Set 2 Workers Vs Production
versus production
This scatter 500
plot shows the relationship
between the number of workers and
amount of production in the sample.
450
400
Production
350
f(x) = 3.92698412698413 x + 331.295238095238
R² = 0.00153260662770005
300
250
200
-0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5
weekends
Scatter plot of managers
versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450
400
Production
300
250
200
9 10 11 12 13 14 15 16 17 18
managers
Scatter plot of shifts categorical
variables
versus production
500 This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450
400
Production
350
f(x) = − 4.21 x + 340.893333333333
R² = 0.00559198561463092
300
250
200
0.8 1.3 1.8 2.3 2.8 evening 3.3
morning afternoon
shifts
Scatter plot of machines
versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450
350
300
250
200
15 20 25 30 35 40 45
machines
Scatter plot of workers versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450
400
Production
350
300
250
200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 The 95% confidence interval we built based on the
descriptive statistics suggests that we can expect
production to be somewhere between the red lines
95% of the time.
450 Upper boundary
400
Production
300
250
Lower boundary
200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 Production is highest when there are many workers
and lowest when there are few workers.
If we take that into account, we could make better
forecasts for production.
450
400
Production
350
300
250
200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 Production is highest when there are many workers
and lowest when there are few workers.
The trendline shows that relationship.
We can expect production to be along the trend line.
450
This is the equation of the trend line
400
f(x) = 4.01996518980369 x + 75.329560025557
Production
R² = 0.385696593213043
350
300
Trend line
250
200
45 50 55 60 65 70 75 80
Workers
Shift Weekend Production Workers Managers Machines
Weekend 0.00
Production -0.07 0.04
Workers -0.17 0.00 0.62
Managers -0.18 -0.02 0.56 0.96
Machines 0.04 0.07 0.77 0.24 0.18
Breakdown -0.07 -0.03 -0.07 0.03 0.04 -0.02
Delivery 0.05 -0.04 0.03 0.05 0.01 -0.01
The correlation coefficient tells us the strength of the relationship in the sample.
A correlation value close to 0 means there is no relationship between the two variables in the sample.
A correlation value close to 1 means there is a strong positive relationship between the two variables in the sam
A correlation value close to -1 means there is a strong negative relationship between the two variables in the sam
We would like to see a strong correlation between some of the variables and production. Because a strong corre
There are strong positive correlations between workers and production; between managers and production; and
We can expect workers, managers and machines to influence production and we could take this information into
We hope not to see strong correlations between pairs of other variables. Because strong correlation there would
This is called multicollinearity.
Anything above 0.95 or below -0.95 means that multicollinearity would be a severe issue for our model.
The correlation between workers and managers is 0.96. That means that we should not include both workers an
We could combine the variables somehow or leave one of the variables out.
I will chose to leave out managers as I think the relationship between workers and production is more important
Breakdown
0.01
oduction. Because a strong correlation means we might be able to use that variable to improve our forecasts for production.
n managers and production; and between machines and production.
e could take this information into account in building our forecasts for production.
e strong correlation there would mean that we can not include both variables in our models to forecast production.
nd production is more important than the relationship between managers and production.
ts for production.
Number Shift Weekend Production Workers Managers Machines Breakdown
Mean 75.50 2.00 0.30 332.47 63.97 13.26 30.23 0.05
Standard Error 3.55 0.07 0.04 3.77 0.58 0.12 0.42 0.02
Median 75.50 2.00 0.00 332.00 65.00 13.00 30.00 0.00
Mode #N/A 1.00 0.00 351.00 70.00 14.00 29.00 0.00
Standard Deviation 43.45 0.82 0.46 46.12 7.13 1.48 5.10 0.23
Sample Variance 1887.50 0.67 0.21 2127.22 50.77 2.19 25.96 0.05
Kurtosis -1.20 -1.51 -1.24 0.45 -1.02 -1.04 -0.19 14.32
Skewness 0.00 0.00 0.88 0.28 -0.17 -0.09 0.05 4.02
Range 149.00 2.00 1.00 251.00 28.00 6.00 26.00 1.00
Minimum 1.00 1.00 0.00 215.00 50.00 10.00 17.00 0.00
Maximum 150.00 3.00 1.00 466.00 78.00 16.00 43.00 1.00
Sum 11325.00 300.00 45.00 49871.00 9595.00 1989.00 4534.00 8.00
Count 150.00 150.00 150.00 150.00 150.00 150.00 150.00 150.00
Mean 332.47
Standard deviation 46.12
2 times standard deviation 92.24
We are 95% confident that production in a particular shift will be between 240 and 4
Delivery
0.11
0.03
0.00
0.00
0.31
0.10
4.69
2.57
1.00
0.00
1.00
16.00
150.00
that production in a particular shift will be within 2 standard deviations of the mean.
Frequency
450 5
475 1 20
500 0
15
10
0
200 225 250 275 300 325 350 37
Production
mple of data.
scriptive statistics, we can expect that 95% of the time production will be between 240 and 425 units per shift (indicated by the red oval).
icated by the red oval).
Forecasts
This page shows how our forecast for production varies as we build more complex models
We will product production for weekday afternoon shift with 60 workers on duty and 30 production machines in operation wi
1
0.014097 1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.621243080046757 correlation
R Square 0.385942964505981 percentge of variation explainded by workers
Adjusted R Squa 0.381765705761124
Standard Error 36.3746789728139 how confident we are in the forecast ? Smaller is better
Observations 149
ANOVA
df SS MS
Regression 1 122244.714274969 122244.714274969
Residual 147 194498.238745165 1323.11727037527
Total 148 316742.953020134
F Significance F
92.3914433074383 2.8385729806E-17 does the mode help to explain the prodcution ?
under 5% the model doesnot help to explain th eproduction
0
Variable
Number
Shift
Weekend
Production
Factory Workers
Managers
Machines
Breakdown
Delivery
Afternoon
Definition
Observation number
Time of day. One indicates a morning shift, two indicates an afternoon shift, three indicates an evening shift.
Weekend shifts are given a value of one.
Number of air conditioners produced
Number of workers on duty
Number of managers on duty
Number of production machines online
A value of one indicates that at least one machine broke down.
A value of one indicates that a delivery of parts occurred.
A value of one indicates an afternoon shift.
Scatter plot Correlation
No obvious relationship
Production seems higher in the afternoon No obvious relationship
No obvious relationship No obvious relationship
Remember you are free to clean the data differently. You just need to be able to justify your choices.
Reasoning
There is no shift 4. The pattern shows this should be shift 1.
It seems reasonable to round this value to the nearest integer.
It seems reasonable to round this value to the nearest integer.
This is the mode and seems reasonable.
This is the mode and seems reasonable.
We assume this is the correct value for the binary variable.
We can determine the correct value from the pattern of data.
We can determine the correct value from the pattern of data.