You are on page 1of 38

Scatter plot ofweekends

versus production
This scatter 500
plot shows the relationship
between the number of workers and
amount of production in the sample.
450

400
Production

350
f(x) = 3.92698412698413 x + 331.295238095238
R² = 0.00153260662770005

300

250

200
-0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5
weekends
Scatter plot of managers
versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450

400
Production

f(x) = 17.3026372147097 x + 103.040363866283


R² = 0.308736683241628
350

300

250

200
9 10 11 12 13 14 15 16 17 18
managers
Scatter plot of shifts categorical
variables
versus production
500 This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450

400
Production

350
f(x) = − 4.21 x + 340.893333333333
R² = 0.00559198561463092

300

250

200
0.8 1.3 1.8 2.3 2.8 evening 3.3
morning afternoon

shifts
Scatter plot of machines
versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450

f(x) = 7.00409827589773 x + 120.762789447198


R² = 0.598722687680407
400
Production

350

300

250

200
15 20 25 30 35 40 45
machines
Scatter plot of workers versus production
500
This scatter plot shows the relationship
between the number of workers and
amount of production in the sample.
450

400
Production

350

300

250

200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 The 95% confidence interval we built based on the
descriptive statistics suggests that we can expect
production to be somewhere between the red lines
95% of the time.
450 Upper boundary

400
Production

350 Forecast line

300

250
Lower boundary

200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 Production is highest when there are many workers
and lowest when there are few workers.
If we take that into account, we could make better
forecasts for production.
450

400
Production

350

300

250

200
45 50 55 60 65 70 75 80
Workers
Scatter plot of workers versus production
500 Production is highest when there are many workers
and lowest when there are few workers.
The trendline shows that relationship.
We can expect production to be along the trend line.
450
This is the equation of the trend line

400
f(x) = 4.01996518980369 x + 75.329560025557
Production

R² = 0.385696593213043
350

300
Trend line
250

200
45 50 55 60 65 70 75 80
Workers
Shift Weekend Production Workers Managers Machines
Weekend 0.00
Production -0.07 0.04
Workers -0.17 0.00 0.62
Managers -0.18 -0.02 0.56 0.96
Machines 0.04 0.07 0.77 0.24 0.18
Breakdown -0.07 -0.03 -0.07 0.03 0.04 -0.02
Delivery 0.05 -0.04 0.03 0.05 0.01 -0.01

The correlation coefficient tells us the strength of the relationship in the sample.
A correlation value close to 0 means there is no relationship between the two variables in the sample.
A correlation value close to 1 means there is a strong positive relationship between the two variables in the sam
A correlation value close to -1 means there is a strong negative relationship between the two variables in the sam

We would like to see a strong correlation between some of the variables and production. Because a strong corre
There are strong positive correlations between workers and production; between managers and production; and
We can expect workers, managers and machines to influence production and we could take this information into

We hope not to see strong correlations between pairs of other variables. Because strong correlation there would
This is called multicollinearity.
Anything above 0.95 or below -0.95 means that multicollinearity would be a severe issue for our model.
The correlation between workers and managers is 0.96. That means that we should not include both workers an
We could combine the variables somehow or leave one of the variables out.
I will chose to leave out managers as I think the relationship between workers and production is more important
Breakdown

0.01

riables in the sample.


een the two variables in the sample. We can expect both variables to increase or decrease together.
ween the two variables in the sample. We can expect the variables to move in opposite directions.

oduction. Because a strong correlation means we might be able to use that variable to improve our forecasts for production.
n managers and production; and between machines and production.
e could take this information into account in building our forecasts for production.

e strong correlation there would mean that we can not include both variables in our models to forecast production.

ere issue for our model.


uld not include both workers and managers as separate variables in our model for production.

nd production is more important than the relationship between managers and production.
ts for production.
Number Shift Weekend Production Workers Managers Machines Breakdown
Mean 75.50 2.00 0.30 332.47 63.97 13.26 30.23 0.05
Standard Error 3.55 0.07 0.04 3.77 0.58 0.12 0.42 0.02
Median 75.50 2.00 0.00 332.00 65.00 13.00 30.00 0.00
Mode #N/A 1.00 0.00 351.00 70.00 14.00 29.00 0.00
Standard Deviation 43.45 0.82 0.46 46.12 7.13 1.48 5.10 0.23
Sample Variance 1887.50 0.67 0.21 2127.22 50.77 2.19 25.96 0.05
Kurtosis -1.20 -1.51 -1.24 0.45 -1.02 -1.04 -0.19 14.32
Skewness 0.00 0.00 0.88 0.28 -0.17 -0.09 0.05 4.02
Range 149.00 2.00 1.00 251.00 28.00 6.00 26.00 1.00
Minimum 1.00 1.00 0.00 215.00 50.00 10.00 17.00 0.00
Maximum 150.00 3.00 1.00 466.00 78.00 16.00 43.00 1.00
Sum 11325.00 300.00 45.00 49871.00 9595.00 1989.00 4534.00 8.00
Count 150.00 150.00 150.00 150.00 150.00 150.00 150.00 150.00

Confidence interval for production:


If the data is normally distributed then we are 95% confident that production in a par

Mean 332.47
Standard deviation 46.12
2 times standard deviation 92.24

Lower 95% confidence interval 240.23


Upper 95% confidence interval 424.72

We are 95% confident that production in a particular shift will be between 240 and 4
Delivery
0.11
0.03
0.00
0.00
0.31
0.10
4.69
2.57
1.00
0.00
1.00
16.00
150.00

that production in a particular shift will be within 2 standard deviations of the mean.

l be between 240 and 425 units.


Bin Frequency
200 0
225 2 Histogram of producti
250 4
275 5 45
300 19
40
325 36
350 40 35
375 20
400 9 30
425 9
25

Frequency
450 5
475 1 20
500 0
15

10

0
200 225 250 275 300 325 350 37
Production

This is the distribution of production in the sample of data.


Based on the confidence interval from the descriptive statistics, we can expect
istogram of production

300 325 350 375 400 425 450 475 500


Production

mple of data.
scriptive statistics, we can expect that 95% of the time production will be between 240 and 425 units per shift (indicated by the red oval).
icated by the red oval).
Forecasts

This page shows how our forecast for production varies as we build more complex models
We will product production for weekday afternoon shift with 60 workers on duty and 30 production machines in operation wi

The forecast is the most likely outcome.


The confidence interval gives us a range of probable outcomes around that forecast.

Forecast 95% Confidence Interval


Lower
Just descriptive statistics
Mean 332.5 332.5 240.2
Standard deviation 46.1 We are 95% confident that productio

Trend line Forecast


Variable Coefficient Forecast value
Intercept 75.33 1 316.5
Workers 4.02 60
duction machines in operation with no breakdowns or deliveries.

95% Confidence Interval Lines for chart


Upper X Lower
0 240.2
424.7 100 240.2
are 95% confident that production will be within this range.
Lines for chart
Middle Upper
332.5 424.7
332.5 424.7
number shift weekend prod production Column 5 Column 6 Column 7
Column 1 1
Column 2 0.018857 1
Column 3 -0.052916 0 1
Column 4 -0.039136 -0.07478 0.0391485201214558 1
Column 5 -0.03006 -0.167862 -0.0010242713975372 0.621045 1
Column 6 -0.044588 -0.182529 -0.0167537779952965 0.555641 0.9579236725 1
Column 7 -0.044204 0.043411 0.0681807823853839 0.773772 0.2431142434 0.184232 1
Column 8 -0.106207 -0.072675 -0.0258976989621835 -0.066988 0.0303593187 0.03859 -0.016437
Column 9 0.021946 0.052901 -0.0377023090543705 0.031209 0.0472387306 0.012289 -0.011171
Column 8 Column 9

1
0.014097 1
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.621243080046757 correlation
R Square 0.385942964505981 percentge of variation explainded by workers
Adjusted R Squa 0.381765705761124
Standard Error 36.3746789728139 how confident we are in the forecast ? Smaller is better
Observations 149

ANOVA
df SS MS
Regression 1 122244.714274969 122244.714274969
Residual 147 194498.238745165 1323.11727037527
Total 148 316742.953020134

Coefficients Standard Error t Stat


Intercept 75.237048866177 26.9171792326795 2.79513125115404
64 4.01990134497321 0.418214916847414 9.61204678033967

forceasted prodction =75+4 * workers

every worker adds 4 to production


in the sample we xoollected , each worker adds 4 to prduction .
? Smaller is better the p vaue says that each wrooker does add to production but we cant be sure how much
but we ar 95 % confident tht n extra worker will addd between 3.2 and 4.8 to production

F Significance F
92.3914433074383 2.8385729806E-17 does the mode help to explain the prodcution ?
under 5% the model doesnot help to explain th eproduction

P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%


0.005880761860991 22.0424226850503 128.431675047304 22.0424226850503 128.431675047304
2.83857298058E-17 3.19341109602246 4.84639159392395 3.19341109602246 4.84639159392395

oes workers help to explain production ?


under 5% then workers help to explain prduction
e sure how much
8 to production
Intercept 75.23705
64 4.019901
Standard Error 36.37468

whar if we have 50 workers in duty tomorrow morning ?


we forecats production will be75.3+4*50

forecats production 276.232116114837


confidence interval I am 95% confident that the productinon w

95% confidence interv203.482758169209 lower


348.981474060465 upper
confident that the productinon will be within 2 standard aerrors of my fofecast
Number Shift Weekend Production Workers Managers Machines Breakdown Delivery
1 1 1 347 64 14 31 0 0
2 2 1 345 68 15 30 0 0
3 3 1 286 59 13 29 0 0
4 1 1 316 71 15 31 1 0
5 2 1 466 72 15 41 0 0
6 3 1 303 65 13 30 0 0
7 1 0 300 69 14 23 0 1
8 2 0 415 73 15 37 0 0
9 3 0 331 56 11 35 0 0
10 1 0 241 57 12 23 0 0
11 2 0 277 55 11 26 1 0
12 3 0 318 53 11 37 0 0
13 1 0 373 64 14 35 0 0
14 2 0 306 60 12 29 0 0
15 3 0 301 67 14 29 0 0
16 1 0 381 68 14 36 0 0
17 2 0 301 72 15 20 1 0
18 3 0 235 55 12 18 0 0
19 1 0 434 77 16 38 0 0
20 2 0 340 62 13 28 0 0
21 3 0 327 64 13 31 0 0
22 1 1 309 58 12 27 0 0
23 2 1 337 71 15 26 0 0
24 3 1 351 62 13 35 0 0
25 1 1 378 75 15 36 0 1
26 2 1 337 59 12 29 0 0
27 3 1 429 73 15 38 0 1
28 1 0 439 73 15 43 0 0
29 2 0 330 70 15 26 0 0
30 3 0 285 60 12 29 0 0
31 1 0 263 52 11 21 0 0
32 2 0 349 69 14 31 0 1
33 3 0 276 54 11 31 0 0
34 1 0 345 65 13 32 0 0
35 2 0 443 78 16 37 0 0
36 3 0 344 76 16 31 0 0
37 1 0 299 67 14 27 0 0
38 2 0 298 50 11 25 0 0
39 3 0 286 57 12 26 0 0
40 1 0 305 52 11 33 1 0
41 2 0 361 64 14 29 0 0
42 3 0 344 67 14 36 0 0
43 1 1 302 57 12 31 0 0
44 2 1 353 62 13 29 0 0
45 3 1 366 64 13 39 0 0
46 1 1 338 69 14 32 0 0
47 2 1 342 58 12 32 0 0
48 3 1 268 55 11 30 0 0
49 1 0 290 64 13 25 0 1
50 2 0 442 75 15 40 0 0
51 3 0 329 57 12 32 0 0
52 1 0 303 54 11 32 0 0
53 2 0 406 75 15 32 0 0
54 3 0 316 60 12 31 0 0
55 1 0 289 73 15 23 0 0
56 2 0 351 68 14 28 0 0
57 3 0 246 50 11 25 0 0
58 1 0 351 74 15 29 0 0
59 2 0 345 68 14 29 0 0
60 3 0 395 70 14 38 1 0
61 1 0 307 70 15 27 0 0
62 2 0 422 68 14 37 0 0
63 3 0 287 61 13 25 0 0
64 1 1 392 74 16 33 0 0
65 2 1 340 54 11 34 0 0
66 3 1 324 73 15 30 0 0
67 1 1 316 69 14 30 0 0
68 2 1 328 66 14 26 0 0
69 3 1 220 51 11 20 0 0
70 1 0 401 70 14 40 0 0
71 2 0 348 60 12 29 0 0
72 3 0 262 57 12 25 0 0
73 1 0 348 65 13 34 0 1
74 2 0 343 57 12 34 0 0
75 3 0 215 54 12 19 0 0
76 1 0 307 65 14 30 0 0
77 2 0 396 69 14 32 0 0
78 3 0 412 77 16 36 0 0
79 1 0 277 59 12 23 0 0
80 2 0 303 61 13 17 0 1
81 3 0 337 55 11 36 0 1
82 1 0 310 54 11 29 0 0
83 2 0 358 66 13 28 0 0
84 3 0 358 70 14 30 0 0
85 1 1 303 54 11 28 0 0
86 2 1 417 72 14 35 0 0
87 3 1 348 59 12 33 0 0
88 1 1 330 73 15 23 0 0
89 2 1 301 62 12 23 0 0
90 3 1 304 55 11 34 1 1
91 1 0 328 70 15 24 0 0
92 2 0 344 61 13 34 0 0
93 3 0 406 69 15 39 0 0
94 1 0 309 73 15 26 0 0
95 2 0 320 62 13 23 0 0
96 3 0 277 60 13 29 0 0
97 1 0 229 50 11 24 0 0
98 2 0 322 54 12 27 0 1
99 3 0 378 70 15 33 0 0
100 1 0 380 65 13 37 0 1
101 2 0 361 62 13 31 0 0
102 3 0 270 59 12 25 0 0
103 1 0 304 70 15 27 1 0
104 2 0 351 58 12 31 0 1
105 3 0 346 68 14 35 0 0
106 1 1 346 71 14 29 0 0
107 2 1 343 60 13 34 0 0
108 3 1 266 66 14 22 0 1
109 1 1 305 68 15 26 0 0
110 2 1 316 54 11 27 0 0
111 3 1 413 73 15 39 0 0
112 1 0 351 70 15 33 0 0
113 2 0 340 58 12 31 0 0
114 3 0 366 70 14 33 0 1
115 1 0 311 55 12 35 0 0
116 2 0 319 63 13 26 0 0
117 3 0 341 65 13 32 0 0
118 1 0 351 69 15 29 0 0
119 2 0 344 60 13 29 0 0
120 3 0 310 65 14 32 0 0
121 1 0 377 70 14 39 0 0
122 2 0 411 65 13 40 0 0
123 3 0 326 66 14 28 0 0
124 1 0 354 74 16 30 1 0
125 2 0 308 51 10 29 0 0
126 3 0 352 73 15 29 0 1
127 1 1 342 58 12 37 0 0
128 2 1 318 61 12 26 0 0
129 3 1 283 55 11 25 0 0
130 1 1 379 70 15 35 0 0
131 2 1 345 69 14 27 0 0
132 3 1 352 54 11 38 0 0
133 1 0 317 72 15 28 0 0
134 2 0 343 65 13 31 0 0
135 3 0 291 56 11 30 0 0
136 1 0 330 67 14 27 0 0
137 2 0 353 70 14 29 0 0
138 3 0 311 67 14 28 0 1
139 1 0 289 64 13 24 0 0
140 2 0 333 71 15 28 0 0
141 3 0 347 66 13 32 0 0
142 1 0 340 64 13 30 0 0
143 2 0 319 62 13 30 0 0
144 3 0 365 72 15 30 0 0
145 1 0 288 50 11 34 0 0
146 2 0 276 60 12 23 0 0
147 3 0 297 54 12 29 0 0
148 1 1 303 71 14 27 0 0
149 2 1 372 57 12 40 0 0
150 3 1 306 67 14 27 0 0
x y

0
Variable
Number
Shift
Weekend
Production
Factory Workers
Managers
Machines
Breakdown
Delivery
Afternoon
Definition
Observation number
Time of day. One indicates a morning shift, two indicates an afternoon shift, three indicates an evening shift.
Weekend shifts are given a value of one.
Number of air conditioners produced
Number of workers on duty
Number of managers on duty
Number of production machines online
A value of one indicates that at least one machine broke down.
A value of one indicates that a delivery of parts occurred.
A value of one indicates an afternoon shift.
Scatter plot Correlation
No obvious relationship
Production seems higher in the afternoon No obvious relationship
No obvious relationship No obvious relationship

Positive relationship with production Positive relationship with production


Positive relationship with production Positive relationship with production but multicollinearity with workers
Positive relationship with production Positive relationship with production
No obvious relationship No obvious relationship
No obvious relationship No obvious relationship
Maybe a positive relationship
rity with workers
Cell Variable Original value New value
B5 Shift 4 1
D3 Production 345.1 345
E6 Workers 72.25 72
H16 Breakdown Missing value 0
I16 Delivery Missing value 0
I8 Delivery 2 1
B151 Shift Missing value 3
C151 Weekend Missing value 1

Remember you are free to clean the data differently. You just need to be able to justify your choices.
Reasoning
There is no shift 4. The pattern shows this should be shift 1.
It seems reasonable to round this value to the nearest integer.
It seems reasonable to round this value to the nearest integer.
This is the mode and seems reasonable.
This is the mode and seems reasonable.
We assume this is the correct value for the binary variable.
We can determine the correct value from the pattern of data.
We can determine the correct value from the pattern of data.

e data differently. You just need to be able to justify your choices.

You might also like