Title
Ghost Rider: Spirit of Vengeance
The Cold Light of Day
Stolen
Resident Evil: Retribution
Red Dawn
The Man with the Iron Fists
Wrath of the Titans
Hit and Run
Haywire
Battleship
Lockout
This Means War
Snow White and the Huntsman
Act of Valor
Contraband
Taken 2
Safe
Premium Rush
The Bourne Legacy
John Carter
Safe House
The Expendables 2
Get the Gringo
The Amazing SpiderMan
Jack Reacher*
The Hunger Games
Dredd
The Raid: Redemption
End of Watch
Looper
Skyfall
The Avengers
The Dark Knight Rises
Django Unchained*
Madea's Witness Protection
Fun Size
The Three Stooges
One For The Money
That's My Boy
Mirror Mirror
Parental Guidance
Wanderlust
A Thousand Words
For a Good Time, Call...
Damsels in Distress
Think Like a Man
Budget ($)in Millions
57
20
35
65
65
20
150
2
23
209
20
65
170
12
25
45
33
35
125
250
85
100
20
230
60
78
45
1.1
7
30
200
220
250
83
20
14
30
40
70
85
6.5
35
40
5.7
3
12
Diary of a Wimpy Kid: Dog Days
Iron Sky
Friends with Kids
Magic Mike
The Campaign
The FiveYear Engagement
Dark Shadows
To Rome with Love
This Is 40*
The Dictator
Celeste and Jesse Forever
Jeff, Who Lives at Home
Project X
Your Sister's Sister
Seeking a Friend for the End of the World
American Reunion
Men in Black 3
Safety Not Guaranteed
The Best Exotic Marigold Hotel
21 Jump Street
Ted
Seven Psychopaths
Moonrise Kingdom
Good Deeds
Darling Companion
Won't Back Down
Cosmopolis
W.E.
Big Miracle
Deadfall
The Odd Life of Timothy Green
Compliance
Arbitrage
The Words
Salmon Fishing in the Yemen
Smashed
People Like Us
Anna Karenina
Hitchcock*
Beasts of the Southern Wild
We Need to Talk About Kevin
Flight
The Impossible
The Master
Silver Linings Playbook
Argo
The Perks of Being a Wallflower
22
7
10
7
56
30
150
24.8
35
65
8
10
12
0.125
10
50
215
0.75
10
42
65
15
16
14
12
19
20
29
30
12
25
10
13
6
14.5
5
16
50
15
1.8
7
31
45
35
21
44.5
13
Lincoln
Life of Pi
Total Recall
Chronicle
Prometheus
Cloud Atlas
60
120
125
15
130
102
*Movies released in recent week.
Q1(i)
Ans
Is it a good idea to make a bigger budget movie for profit?
Not necessarily, the correlation, hypothesis test and regression model indicate that the budg
(ii)
Ans
Why does the length of the movie affect the budget?
The length of the movie affect the budget,because eg SciFi and Action movies require's spe
(iii)
Ans
According to the data of movies released during the year of 2012 shows that there is no correlatio
Q2)(a) Ans
(b)
What is the correlation between viewer rating and gross collection?
Sourceswww.Imdb.com,www.wikipedia.com,www.boxofficemojo.com
Unit of measurement Gross collection,Weekend collection and Gross collection are measured in
length is measured in Minutes and Viewer rating are counted out of 10.
(c)
Means
Variance
Mode
Standard Deviation
Q3)
Correlation and Equality of Means
Budget ($)in Millions
First week collection($) in Mi
Gross ($) in Millions
Length in minutes
Viewer Rating
51.49267677
3653.788039
20
60.44657177
Budget ($)in Millions
1
0.669285261
0.790045357
0.566813015
0.233063936
From the above table we can see that the movies released during the year 2012,
gross collection of the movies.we can see from the correlation matrix above that the co
weekend collection while considering it's impact on gross collection i.e .(0.937311029) the high
H0: the first week collection has no impact on gross collection for all the movies released during
Mathematically H0 : µ first weekend collection  µ gross collection = 0
Mathematically H0 : µ first weekend collection

µ gross collection = 0
H1: Gross collection of all the movies released during the yr 2012 is directly proportional to the
Mathematically H1 : µ first weekend collection  µ gross collection ≠ 0
Let us consider α =.05 to establish this hypothesis test , to establish same we will use two tail te
of two sample assuming unequal variance with hypothesized mean difference as zero.
tTest: TwoSample Assuming Unequal Variances
First weekend collection($) in Mi
Mean
19.86907785
Variance
1048.60121
Observations
99
Hypothesized Mean Difference
0
df
101
t Stat
5.000790229
P(T<=t) onetail
1.20597E06
t Critical onetail
1.66008063
P(T<=t) twotail
2.41194E06
t Critical twotail
1.983731003
As we can see that t stat is less than t critical two tail hence we can reject H0 and conclude that H
collection of all the movies released during the yr 2012 is directly proportional to the weekend co
Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
0.937311029
0.878551965
0.877299923
85.13958791
99
df
Regression
Residual
Total
Intercept
First weekend collection($) in Mi
RESIDUAL OUTPUT
Observation
1
97
98
Coefficients
3.322237343
7.035381893
Predicted Gross ($) in Millions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
158.9120577
8.951893652
4.610591653
151.432694
103.7640489
58.9790028
238.706332
44.83099051
62.59793289
182.9694828
47.1655835
125.7796021
398.835226
175.5246909
174.6324849
351.6775466
58.84926332
47.64514327
271.6715777
215.6513855
285.9526642
204.4734442
5.643040981
439.5488966
113.0741949
1076.46947
47.49381924
4.826296462
95.85638517
149.6690996
625.0017462
1462.732768
1135.225799
219.2240369
181.954629
32.17445809
122.9949628
84.3402178
97.97425322
130.8883798
107.4458894
49.23971258
46.77472582
4.334875036
3.734433333
239.9664744
106.204841
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
3.5332988
17.51588111
278.5968207
190.3822074
77.96806136
212.1694765
5.864535909
84.78615548
125.9847679
4.080545981
9.342476948
151.4266154
84.78615548
30.21711635
154.6820062
387.4032862
4.010030348
8.507672603
258.7249765
386.1539853
32.69435874
7.001713932
112.9610941
3.603385275
21.6379395
3.81709907
3.653420911
57.91824309
3.458660434
79.46549314
3.437807562
17.40823273
36.74659097
4.911487901
3.511791638
33.26076327
5.578413963
5.346422245
4.516155722
3.495216278
178.5072285
6.136390101
8.502466421
6.438932628
140.2174651
4.928830117
9.965804749
95
96
97
98
99
161.2772124
183.2715329
158.12947
362.4791936
70.94806584
First weekend collection($) in
Mi Residual Plot
Residuals
500
0
0.00
500
50.00
100.00
150.00
200.00
250.00
First weekend collection($) in Mi
CONCLUSION
As per the analysis of data abovefor movies released during the year 2012 indicates that gross c
collections. In action movies viewer rating and length were depended on the Gross collection bu
accurate of viewer rating and length are correlated . Comedy movies data displayed they were d
explained by regression test, it slated that variance of 70.1% of population of gross collection ca
collection thus regression model is very accurate and hence supporting our test
weekend collection and the variance of 73.4% of population of gross collection can be explained
very accurate and hence supporting our test. in our data the SciFi movies released during the ye
genre. In SciFi movies budget and length were co related as budget were higher for the sci
Budget can be explained by variance of length of the movie thus regression model is not very a
PROJECT BY: SAYED ANWAR & HETAL KHATRI
MIB 2012 (SEPT)
First weekend collection($) in Mi Gross ($) in Millions
22.12
132.5
0.80
16.8
0.18
2.5
21.05
221.6
14.28
39
7.91
18.4
33.46
301.9
5.90
14.4
8.43
33.3
25.53
302
6.23
28
17.41
156.3
56.22
396.3
24.48
80.4
24.35
96.2
49.51
365
7.89
40.3
6.30
30.6
38.14
276
30.18
282.7
40.17
207.8
28.59
312.5
0.33
7.5
62.00
752
15.60
110
152.54
686.6
6.28
36.2
0.21
4.1
13.15
40
20.80
166
88.36
978
207.44
1550.1
160.89
1081
30.69
150
25.39
65.6
4.10
9.2
17.01
53
11.52
36.8
13.45
57.7
18.13
162.8
14.80
29.3
6.53
21.4
6.18
20.5
0.14
1.2
0.06
1.3
33.64
99.19
Length in minutes
95
93
96
95
114
96
99
100
93
131
95
97
127
110
110
91
95
91
135
132
115
103
96
136
130
142
95
101
109
118
143
143
165
165
114
90
92
91
114
106
105
98
91
85
99
123
Viewer Rating
4.4
4.8
5.3
5.3
5.5
5.8
5.8
5.9
5.9
6
6.1
6.3
6.3
6.4
6.4
6.4
6.5
6.6
6.7
6.7
6.8
7
7.1
7.2
7.3
7.3
7.4
7.6
7.7
7.8
8
8.4
8.7
8.8
3.9
5
5.1
5.1
5.5
5.5
5.6
5.6
5.6
5.7
6
6
14.62
0.03
2.02
39.13
26.59
10.61
29.69
0.36
11.58
17.44
0.11
0.86
21.05
11.58
3.82
21.51
54.59
0.10
0.74
36.30
54.42
4.17
0.52
15.58
0.04
2.60
0.07
0.05
7.76
0.02
10.82
0.02
2.00
4.75
0.23
0.03
4.26
0.32
0.29
0.17
0.02
24.90
0.40
0.74
0.44
19.46
0.23
76.5
8
12
165
103
53.7
238.7
73
20.7
177.5
26
4.5
101
1.1
9.6
234.7
624
4
134
202
501.7
15.1
65
35
7.9
5.2
6.5
0.89
24
0.45
51.6
31
23
11.4
34.5
2.9
12.4
27
4.5
11
6
95.5
60.3
18.8
32
159.6
28
94
93
100
110
85
124
150
112
133
83
92
83
88
90
101
113
106
86
124
109
106
110
94
111
103
121
109
119
107
95
104
90
100
96
107
81
114
130
98
93
112
139
113
143
122
120
102
6
6.1
6.1
6.2
6.2
6.3
6.3
6.4
6.5
6.5
6.6
6.6
6.6
6.7
6.7
6.9
6.9
7.1
7.2
7.2
7.3
7.8
7.9
4.3
4.6
4.9
5.3
5.4
6.3
6.4
6.5
6.7
6.7
6.8
6.8
7
7.1
7.1
7.3
7.5
7.5
7.5
7.7
7.8
8.2
8.2
8.3
0.94
22.45
25.58
122.2
240
198
126
402.52
65.6
22.00
51.05
9.61
150
127
118
83
124
171
8.3
8.3
6.3
7.1
7.2
8.1
on model indicate that the budget of the movie is depended on the weekend, gross and viewer rating.
and Action movies require's special effect which creates an excitement for viewer.
2 shows that there is no correlation between viewer rating and gross collection.
ojo.com
d Gross collection are measured in $ Millions.
of 10.
19.86907785
1048.60121
11.579175
32.3821125
143.1087879
59076.97557
28
243.0575561
First weekend collection($) in Mi Gross ($) in Millions
1
0.937311029
0.468210392
0.277329405
1
0.473707897
0.344214701
ring the year 2012, the weekend collection has a big impact on the
n matrix above that the coefficeient of correlation is highest for
lection i.e .(0.937311029) the higher the weekend collection the
n for all the movies released during the year 2012.
collection = 0
109.6161616
383.1981035
95
19.57544644
6.586868687
1.045029891
6.3
1.022267035
Length in minutes
Viewer Rating
1
0.440210241
1
collection = 0
012 is directly proportional to the weekend collection.
collection ≠ 0
ablish same we will use two tail test by using tstatistic using t test
mean difference as zero.
Gross ($) in Millions
143.1087879
59076.97557
99
can reject H0 and conclude that H1 is true which states that Gross
tly proportional to the weekend collection.
SS
5086414.911
703128.6946
5789543.605
Standard Error
10.05320477
0.265590984
Residuals
MS
5086414.911
7248.749429
t Stat
0.3304655
26.48953582
F
Significance F
701.6955077
3.37199E46
Pvalue
0.741760972
3.37199E46
Lower 95% Upper 95%
16.63059126 23.27507
6.508257309 7.562506
26.41205774
7.848106348
2.110591653
70.167306
64.76404889
40.5790028
63.19366799
30.43099051
29.29793289
119.0305172
19.1655835
30.5203979
2.535226016
95.12469093
78.4324849
13.32245337
18.54926332
17.04514327
4.328422285
67.04861446
78.15266424
108.0265558
1.856959019
312.4511034
3.074194882
389.8694699
11.29381924
0.726296462
55.85638517
16.33090036
352.9982538
87.36723239
54.22579948
69.22403689
116.354629
22.97445809
69.99496277
47.5402178
40.27425322
31.91162016
78.14588937
27.83971258
26.27472582
3.134875036
2.434433333
140.7764744
29.70484097
4.4667012
5.515881111
113.5968207
87.3822074
24.26806136
26.53052345
67.13546409
64.08615548
51.51523209
21.91945402
4.842476948
50.42661543
83.68615548
20.61711635
80.01799377
236.5967138
0.010030348
125.4923274
57.14497649
115.5460147
17.59435874
57.99828607
77.96109408
4.296614725
16.4379395
2.68290093
2.763420911
33.91824309
3.008660434
27.86549314
27.56219244
5.591767268
25.34659097
29.5885121
0.611791638
20.86076327
21.06158604
0.846422245
6.483844278
2.504783722
83.00722852
54.1636099
10.29753358
25.56106737
19.38253492
23.07116988
112.2341953
78.72278758
14.72846715
32.12946999
40.04080642
5.348065843
To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and first weekend collection as causal variable hence we
plot Gross collectiona t Y axis and First weekend collection at X axis. As per the table
resulted by regression we can see that R square is 87.8 % which indicates that variance of
87.8% of population of gross collection can be explained by variance of first weekend
collection thus regression model is very accurate and hence supporting our hypothesis test.
The significance quotient is only 3.3% which clearly indicates that probablity of regression
obtained above by chance is only 3.3% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.74 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of first weekend
collection holds true.
he year 2012 indicates that gross collections were dependent on first weekend
pended on the Gross collection but in Regression test we found out that only 45% are
movies data displayed they were directly coeffecient with gross collection which is
of population of gross collection can be explained by variance of first weekend
upporting our test. Drama movies were dependent on the budget and has relation with
of gross collection can be explained by variance of Budget, thus regression model is
Fi movies released during the year 2012 have small percentage compare to other
udget were higher for the scifi movies and that variance of only 34%of population of
hus regression model is not very accurate and hence partially supporting our test.
Lower 95.0%
Upper 95.0%
16.6306 23.27507
6.508257 7.562506
Title
Ghost Rider: Spirit of Vengeance
The Cold Light of Day
Stolen
Resident Evil: Retribution
Red Dawn
The Man with the Iron Fists
Wrath of the Titans
Hit and Run
Haywire
Battleship
Lockout
This Means War
Snow White and the Huntsman
Act of Valor
Contraband
Taken 2
Safe
Premium Rush
The Bourne Legacy
John Carter
Safe House
The Expendables 2
Get the Gringo
The Amazing SpiderMan
Jack Reacher*
The Hunger Games
Dredd
The Raid: Redemption
End of Watch
Looper
Skyfall
The Avengers
The Dark Knight Rises
Django Unchained*
Budget ($)in Millions First weekend collection($) in Mi
57.00
22.12
20.00
0.80
35.00
0.18
65.00
21.05
65.00
14.28
20.00
7.91
150.00
33.46
2.00
5.90
23.00
8.43
209.00
25.53
20.00
6.23
65.00
17.41
170.00
56.22
12.00
24.48
25.00
24.35
45.00
49.51
33.00
7.89
35.00
6.30
125.00
38.14
250.00
30.18
85.00
40.17
100.00
28.59
20.00
0.33
230.00
62.00
60.00
15.60
78.00
152.54
45.00
6.28
1.10
0.21
7.00
13.15
30.00
20.80
200.00
88.36
220.00
207.44
250.00
160.89
83.00
30.69
Correlation and Equality of Means
Budget ($)in Millions
First week collection($) in Mi
Gross ($) in Millions
Length in minutes
Viewer Rating
Budget ($)in Millions
1
0.63709825
0.767998296
0.702548668
0.336751297
As the coefficient of correlation is highest Length of action movies are rela
.The more action scence in the movie the more the viewer rating.
As the coefficient of correlation is highest Length of action movies are rela
.The more action scence in the movie the more the viewer rating.
H0: length of the movie has no impact on viewer rating for all Action mov
year 2012.
Mathematically H0 : µ length of movies  µ viewer rating = 0
H1: Viewer rating of all the action movies released during the yr 2012 is di
the length of the movie.
Mathematically H1 : µ first weekend collection  µ gross collection ≠
Let us consider α =.05 to establish this hypothesis test , to establish same w
by using tstatistic using t test of two sample assuming unequal variance w
tTest: TwoSample Assuming Unequal Variances
Viewer Rating
Mean
6.652941176
Variance
1.100142602
Observations
34
Hypothesized Mean Difference
0
df
33
t Stat
28.77296391
P(T<=t) onetail
2.98714E25
t Critical onetail
1.692360309
P(T<=t) twotail
5.97428E25
t Critical twotail
2.034515297
As we can see that t stat is less than t critical two tail hence we can reject H
true Viewer rating of all the action movies released during the yr 2012 is di
length of the movie , as evident the action movies have special effects whic
length of movoe and hence the good viewer rating .
Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
0.676707073
0.457932462
Adjusted R Square
Standard Error
Observations
0.440992852
16.15683708
34
ANOVA
df
Regression
Residual
Total
1
32
33
Coefficients
20.65671279
13.94196183
Intercept
Viewer Rating
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Predicted Length in minutes
82.00134483
87.57812956
94.54911047
94.54911047
97.33750284
101.5200914
101.5200914
102.9142876
102.9142876
104.3084837
105.7026799
108.4910723
108.4910723
109.8852685
109.8852685
109.8852685
111.2794647
112.6736608
114.067857
114.067857
115.4620532
118.2504456
119.6446418
121.0388379
122.4330341
122.4330341
123.8272303
126.6156227
128.0098189
129.404015
132.1924074
32
33
34
137.7691921
141.9517807
143.3459769
30
20
10
0
10 0.00
20
30
40
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
Gross ($) in Millions
132.50
16.80
2.50
221.60
39.00
18.40
301.90
14.40
33.30
302.00
28.00
156.30
396.30
80.40
96.20
365.00
40.30
30.60
276.00
282.70
207.80
312.50
7.50
752.00
110.00
686.60
36.20
4.10
40.00
166.00
978.00
1550.10
1081.00
150.00
Length in minutes
95.00
93.00
96.00
95.00
114.00
96.00
99.00
100.00
93.00
131.00
95.00
97.00
127.00
110.00
110.00
91.00
95.00
91.00
135.00
132.00
115.00
103.00
96.00
136.00
130.00
142.00
95.00
101.00
109.00
118.00
143.00
143.00
165.00
165.00
Viewer Rating
4.40
4.80
5.30
5.30
5.50
5.80
5.80
5.90
5.90
6.00
6.10
6.30
6.30
6.40
6.40
6.40
6.50
6.60
6.70
6.70
6.80
7.00
7.10
7.20
7.30
7.30
7.40
7.60
7.70
7.80
8.00
8.40
8.70
8.80
First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.942919066
0.678489345
0.519736102
1
0.669238903
0.51819843
ngth of action movies are related to the viewer rating
e the viewer rating.
1
0.676707073
1
ngth of action movies are related to the viewer rating
e the viewer rating.
ewer rating for all Action movies released during the
µ viewer rating = 0
eased during the yr 2012 is directly proportional to
µ gross collection ≠ 0
hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean
Length in minutes
113.4117647
466.9768271
34
two tail hence we can reject H0 and conclude that H1 is
leased during the yr 2012 is directly proportional to the
ovies have special effects which will be more as per
SS
7056.846996
8353.388298
15410.23529
MS
7056.846996
261.0433843
Standard Error
18.05364614
2.681481989
t Stat
1.144185093
5.19934942
Residuals
12.99865517
5.421870443
1.45088953
0.45088953
16.66249716
5.520091383
2.520091383
2.914287566
9.914287566
26.69151625
10.70267993
11.4910723
18.5089277
0.114731521
0.114731521
18.88526848
16.27946466
21.67366084
20.93214297
17.93214297
0.46205321
15.25044558
23.64464176
14.96116206
7.566965877
19.56696588
28.82723031
25.61562267
19.00981885
11.40401504
10.8075926
F
Significance F
27.0332344
1.1127E05
Pvalue
0.261033735
1.1127E05
Lower 95%
16.11736101
8.479961753
Upper 95%
Lower 95.0%
57.43078659 16.11736101
19.4039619
8.479961753
5.230807868
23.04821932
21.65402314
10.00
To establish the relation mentioned above we use regression analysis by assuming viewer
rating as dependent variable and length of the movie as causal variable hence we plot
viewer rating at Y axis and length of the movie at X axis. As per the table resulted by
regression we can see that R square is only45 % which indicates that variance of only
45%of population of viewer rating can be explained by variance of length of the movie thus
regression model is not very accurate and hence partially supporting our hypothesis test.
The significance quotient is only 1.1% which clearly indicates that probablity of regression
obtained above by chance is only 1.1% and hence this model can be considered partially
significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.26 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie
partially holds true.
Upper 95.0%
57.43078659
19.4039619
Title
Madea's Witness Protection
Fun Size
The Three Stooges
One For The Money
That's My Boy
Mirror Mirror
Parental Guidance
Wanderlust
A Thousand Words
For a Good Time, Call...
Damsels in Distress
Think Like a Man
Diary of a Wimpy Kid: Dog Days
Iron Sky
Friends with Kids
Magic Mike
The Campaign
The FiveYear Engagement
Dark Shadows
To Rome with Love
This Is 40*
The Dictator
Celeste and Jesse Forever
Jeff, Who Lives at Home
Project X
Your Sister's Sister
Seeking a Friend for the End of the World
American Reunion
Men in Black 3
Safety Not Guaranteed
The Best Exotic Marigold Hotel
21 Jump Street
Ted
Seven Psychopaths
Budget ($)in Millions First weekend collection($) in Mi
20
25.390575
14
4.101017
30
17.010125
40
11.51579
70
13.453714
85
18.132085
6.5
14.8
35
6.52665
40
6.17628
5.7
0.143935
3
0.058589
12
33.636303
22
14.623599
7
0.03
10
2.017466
7
39.12717
56
26.58846
30
10.61006
150
29.685274
24.8
0.361359
35
11.579175
65
17.435092
8
0.107785
10
0.855709
12
21.051363
0.125
11.579175
10
3.822803
50
21.51408
215
54.592779
0.75
0.097762
10
0.737051
42
36.302612
65
54.415205
15
4.174915
Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Million 0.613835983
Gross ($) in Millions
0.782226864
Length in minutes
0.334702147
Viewer Rating
0.082634184
In comedy movies the weekend collection matters. The weekend collection
In comedy movies the weekend collection matters. The weekend collection
impact on the gross collection as the coefficient of correlation is highest . T
have relation with length of the movie.The viewer rating does not matter b
it's own taste in understanding comedy.
H0: the first week collection has no impact on gross collection for all the c
during the year 2012.
Mathematically H0 : µ weekend collection  µ gross collection = 0
H1: Gross collection of all the comedy movies released during the yr 2012
to the weekend collection.
Mathematically H1 : µ weekend collection  µ gross collection ≠ 0
Let us consider α =.05 to establish this hypothesis test , to establish same w
by using tstatistic using t test of two sample assuming unequal variance w
tTest: TwoSample Assuming Unequal Variances
First weekend collection($) in Mi
Mean
15.06629285
Variance
228.0955534
Observations
34
Hypothesized Mean Difference
0
df
34
t Stat
3.498155532
P(T<=t) onetail
0.000663816
t Critical onetail
1.690924255
P(T<=t) twotail
0.001327631
t Critical twotail
2.032244509
As we can see that t stat is less than t critical two tail hence we can reject H
true which states that Gross collection of all the comedymovies released d
directly proportional to the weekend collection.
Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
0.841320894
0.707820847
Adjusted R Square
Standard Error
Observations
ANOVA
0.698690248
75.73546297
34
df
Regression
Residual
Total
1
32
33
Coefficients
17.46393553
7.685921708
Intercept
First weekend collection($) in Mi
RESIDUAL OUTPUT
Observation
Predicted Gross ($) in Millions
1
177.686036
2
14.05616006
3
113.2745535
4
71.04552481
5
85.94025695
6
121.8978502
7
96.28770575
8
32.69938539
9
30.006469
10
16.35766239
11
17.01362506
12
241.0620559
13
94.93190147
14
17.23335788
15
1.957849803
16
283.2644297
17
186.8928864
18
64.08415495
19
210.6947563
20
14.68655854
21
71.53269696
22
116.5408166
23
16.63550846
24
10.88702315
25
144.3351923
26
71.53269696
27
11.91782903
28
147.891599
29
402.1318897
30
16.71254445
31
11.79901925
32
261.5550981
33
34
400.7670698
14.6241343
First weekend collection($) in Mi Residual Plot
Residuals
400
200
0
0
200
10
20
30
40
First weekend collection($) in Mi
50
60
Gross ($) in Millions
65.6
9.2
53
36.8
57.7
162.8
29.3
21.4
20.5
1.2
1.3
99.19
76.5
8
12
165
103
53.7
238.7
73
20.7
177.5
26
4.5
101
1.1
9.6
234.7
624
4
134
201.58
501.7
15.1
Length in minutes
114
90
92
91
114
106
105
98
91
85
99
123
94
93
100
110
85
124
150
112
133
83
92
83
88
90
101
113
106
86
124
109
106
110
Viewer Rating
3.9
5
5.1
5.1
5.5
5.5
5.6
5.6
5.6
5.7
6
6
6
6.1
6.1
6.2
6.2
6.3
6.3
6.4
6.5
6.5
6.6
6.6
6.6
6.7
6.7
6.9
6.9
7.1
7.2
7.2
7.3
7.8
First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.841320894
0.29738332
0.109791556
1
0.267165364
0.330282543
atters. The weekend collection will have a strong
1
0.110400891
1
atters. The weekend collection will have a strong
ent of correlation is highest . The budget does not
ewer rating does not matter because every person has
n gross collection for all the comedy movies released
µ gross collection = 0
es released during the yr 2012 is directly proportional
µ gross collection ≠ 0
hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean
Gross ($) in Millions
98.33441176
19036.42455
34
two tail hence we can reject H0 and conclude that H1 is
the comedymovies released during the yr 2012 is
SS
444654.479
183547.5313
628202.0102
MS
444654.479
5735.860352
Standard Error
18.48447165
0.872939017
t Stat
0.944789543
8.804649074
Residuals
112.086036
4.856160057
60.27455346
34.24552481
28.24025695
40.90214982
66.98770575
11.29938539
9.506468997
17.55766239
18.31362506
141.8720559
18.43190147
25.23335788
13.9578498
118.2644297
83.89288636
10.38415495
28.00524369
87.68655854
50.83269696
60.95918345
42.63550846
15.38702315
43.33519233
70.43269696
2.317829035
86.80840104
221.8681103
20.71254445
145.7990192
59.97509809
F
Significance F
77.52184532
4.63733E10
Pvalue
0.351845885
4.63733E10
Lower 95%
55.11557216
5.907803117
Upper 95%
Lower 95.0%
20.1877011 55.11557216
9.464040298
5.907803117
100.9329302
0.475865701
To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and first weekend collection as causal variable hence we
plot Gross collectiona t Y axis and First weekend collection at X axis. As per the table
resulted by regression we can see that R square is 70.1 % which indicates that variance of
70.1% of population of gross collection can be explained by variance of first weekend
collection thus regression model is very accurate and hence supporting our hypothesis test.
The significance quotient is only 4.6% which clearly indicates that probablity of regression
obtained above by chance is only 4.6% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.35 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of first weekend
collection holds true.
Upper 95.0%
20.1877011
9.464040298
Title
Good Deeds
Darling Companion
Won't Back Down
Cosmopolis
W.E.
Big Miracle
Deadfall
The Odd Life of Timothy Green
Compliance
Arbitrage
The Words
Salmon Fishing in the Yemen
Smashed
People Like Us
Anna Karenina
Hitchcock*
Beasts of the Southern Wild
We Need to Talk About Kevin
Flight
The Impossible
The Master
Silver Linings Playbook
Argo
The Perks of Being a Wallflower
Lincoln
Life of Pi
Budget ($)in Millions First weekend collection($) in Mi
14
15.583924
12
0.039962
19
2.60337
20
0.070339
29
0.047074
30
7.760205
12
0.019391
25
10.822903
10
0.016427
13
2.002165
6
4.750894
14.5
0.225894
5
0.026943
16
4.255423
50
0.32069
15
0.287715
1.8
0.169702
7
0.024587
31
24.900566
45
0.4
35
0.736311
21
0.443003
44.5
19.458109
13
0.228359
60
0.944308
120
22.451514
Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Mi
0.502686058
Gross ($) in Millions
0.85712907
Length in minutes
0.616787474
Viewer Rating
0.384348525
In Drama movies the Gross collection is dependent on the budget of the mo
correlation is highest.
H0: The gross collection for all the drama movies released during the yea
budget of the movie.
H0: The gross collection for all the drama movies released during the yea
budget of the movie.
Mathematically H1 : µ budget  µ gross collection = 0
H1: Gross collection of all the drama movies released during the yr 2012 is
to the budget of the movie.
Mathematically H1 : µ budget  µ gross collection ≠ 0
Let us consider α =.05 to establish this hypothesis test , to establish same w
by using tstatistic using t test of two sample assuming unequal variance w
difference as zero.
tTest: TwoSample Assuming Unequal Variances
Budget ($)in Millions
Mean
25.72307692
Variance
591.2858462
Observations
26
Hypothesized Mean Difference
0
df
34
t Stat
1.225549156
P(T<=t) onetail
0.114395125
t Critical onetail
1.690924255
P(T<=t) twotail
0.22879025
t Critical twotail
2.032244509
As we can see that t stat is less than t critical two tail hence we can reject H
is true which states that Gross collection of all the drama movies released
directly proportional to the budget of the movie.
Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
0.85712907
0.734670243
0.723614836
29.52882724
26
df
Regression
Residual
Total
1
24
25
Coefficients
10.49446037
1.979868376
Intercept
Budget ($)in Millions
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Predicted Gross ($) in Millions
17.22369689
13.26396014
27.12303877
29.10290714
46.92172252
48.9015909
13.26396014
39.00224902
9.304223388
15.24382851
1.384749886
18.21363108
0.59511849
21.18343364
88.49895841
19.20356527
6.930697291
3.364618261
50.88145927
78.59961653
58.80093278
31.08277552
77.60968234
15.24382851
108.2976422
227.0897447
Budget ($)in Millions Residual Plot
Residuals
100
50
0
50
100
0
20
40
60
80
Budget ($)in Millions
100
120
140
Gross ($) in Millions
35
7.9
5.2
6.5
0.89
24
0.45
51.6
31
23
11.4
34.5
2.9
12.4
26.64
4.5
11
6
95.5
60.3
18.8
32
159.6
28
122.2
240
Length in minutes
111
103
121
109
119
107
95
104
90
100
96
107
81
114
130
98
93
112
139
113
143
122
120
102
150
127
Viewer Rating
4.3
4.6
4.9
5.3
5.4
6.3
6.4
6.5
6.7
6.7
6.8
6.8
7
7.1
7.1
7.3
7.5
7.5
7.5
7.7
7.8
8.2
8.2
8.3
8.3
8.3
First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.726299275
0.328596164
0.110079505
1
0.479212016
0.485727023
dent on the budget of the movie as the coefficient of
movies released during the year 2012 is independent of
1
0.279065717
1
movies released during the year 2012 is independent of
s collection = 0
released during the yr 2012 is directly proportional
s collection ≠ 0
hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean
Gross ($) in Millions
40.43384615
3154.842417
26
wo tail hence we can reject H0 and conclude that H1
l the drama movies released during the yr 2012 is
SS
57944.2211
20926.83932
78871.06042
MS
57944.2211
871.9516383
F
Significance F
66.45348039
2.25863E08
Standard Error
8.518614798
0.242872002
t Stat
1.231944467
8.151900416
Pvalue
0.229903996
2.25863E08
Lower 95%
28.0760172
1.4786052
Upper 95%
Lower 95.0%
7.087096462
28.0760172
2.481131551
1.4786052
Residuals
17.77630311
5.363960139
21.92303877
22.60290714
46.03172252
24.9015909
12.81396014
12.59775098
21.69577661
7.756171485
10.01525011
16.28636892
3.49511849
8.783433641
61.85895841
14.70356527
17.93069729
2.635381739
44.61854073
18.29961653
40.00093278
0.917224481
81.99031766
12.75617149
13.90235784
12.9102553
To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and Budget as causal variable hence we plot Gross
collectiona t Y axis and Budget at X axis. As per the table resulted by regression we can see
that R square is 73.4 % which indicates that variance of 73.4% of population of gross
collection can be explained by variance of Budget, thus regression model is very accurate
and hence supporting our hypothesis test.
The significance quotient is only 2.26% which clearly indicates that probablity of regression
obtained above by chance is only 2.26% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.23 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of Budget holds true.
Upper 95.0%
7.087096462
2.481131551
Title
Total Recall
Chronicle
Prometheus
Cloud Atlas
Budget ($)in MillionsFirst weekend collection($) in Mi
125
25.577758
15
22.004098
130
51.050101
102
9.612247
Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Mi 0.386608264
Gross ($) in Millions
0.510235419
Length in minutes
0.591570074
Viewer Rating
0.109305766
In SciFi movies, the length and budget of movie is correlated. Public always
genre movies, this is so due to the special effect in the sci
of high budget and more money invested on sci fi effects leads to high budg
H0: Budget of all SciFi movies released during the year 2012 is independe
Mathematically H1 : µ length  µ budget = 0
H1:Budget of all SciFi movies released during the year 2012 directly prop
Mathematically H1 : µ length  µ budget ≠ 0
Let us consider α =.05 to establish this hypothesis test , to establish same w
two sample assuming unequal variance with hypothesized mean difference
tTest: TwoSample Assuming Unequal Variances
Budget ($)in Millions
Mean
93
Variance
2852.666667
Observations
4
Hypothesized Mean Difference
0
df
5
t Stat
0.961115181
P(T<=t) onetail
0.190317704
t Critical onetail
2.015048373
P(T<=t) twotail
0.380635408
t Critical twotail
2.570581836
As we can see that t stat is less than t critical two tail hence we can reject H
is true which states that Budget of all SciFi movies released during the ye
is true which states that Budget of all SciFi movies released during the ye
proportional of the length of the movie
Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA
0.591570074
0.349955153
0.02493273
52.74032518
4
df
Regression
Residual
Total
Intercept
Length in minutes
RESIDUAL OUTPUT
Observation
1
2
3
Coefficients
15.30259806
0.873408049
Predicted Budget ($)in Millions
1
87.75955171
2
57.19026999
3
93
4
134.0501783
Length in minutes Residual Plot
Residuals
50
0
0
50
20
40
60
80
100
Length in minutes
120
140
160
Gross ($) in Millions
198
126
402.52
65.6
Length in minutes
118
83
124
171
Viewer Rating
6.3
7.1
7.2
8.1
First weekend collection($) in MiGross ($) in Millions Length in minutes Viewer Rating
1
0.990390527
0.319880107
0.360688156
1
0.205474061
0.34543705
1
0.648028511
1
of movie is correlated. Public always want SciFi movies length to be longer compare to other
l effect in the scifi movies which creates an excitement for the public. Scifi movies are made
d on sci fi effects leads to high budget.
d during the year 2012 is independent of the length of the movie.
during the year 2012 directly proportional of the length of the movie.
budget ≠ 0
hypothesis test , to establish same we will use two tail test by using tstatistic using t test of
with hypothesized mean difference as zero.
Length in minutes
124
1308.666667
4
tical two tail hence we can reject H0 and conclude that H1
Fi movies released during the year 2012 directly
Fi movies released during the year 2012 directly
SS
2994.9162
5563.0838
8558
Standard Error
107.6529958
0.841720018
MS
2994.9162
2781.5419
t Stat
0.142147443
1.03764676
F
Significance F
1.076710798
0.408429926
Pvalue
0.899990505
0.408429926
Lower 95%
478.4960544
2.748220882
Upper 95%
447.8908582
4.49503698
Residuals
37.24044829
42.19026999
37
32.0501783
160
180
To establish the relation mentioned above we use regression analysis by assuming Budget
as dependent variable and length of the movie as causal variable hence we plot Budget
Y axis and length of the movie at X axis. As per the table resulted by regression we can see
that R square is only 34% which indicates that variance of only 34%of population of
Budget can be explained by variance of length of the movie thus regression model is not
very accurate and hence partially supporting our hypothesis test.
The significance quotient is only 0.4% which clearly indicates that probablity of regression
obtained above by chance is only 0.4% and hence this model can be considered partially
significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot
as the P value is very low only at 0.89hence again the probablity of such regression
obtained by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie
partially holds true.
***less number of dataset population has resulted in differen results and hence any
interpretation drawn form such a small pool of data is absurd****
Lower 95.0%
Upper 95.0%
478.4960544
447.8908582
2.748220882
4.49503698
