You are on page 1of 176

Genre

Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Action
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy

Title
Ghost Rider: Spirit of Vengeance
The Cold Light of Day
Stolen
Resident Evil: Retribution
Red Dawn
The Man with the Iron Fists
Wrath of the Titans
Hit and Run
Haywire
Battleship
Lockout
This Means War
Snow White and the Huntsman
Act of Valor
Contraband
Taken 2
Safe
Premium Rush
The Bourne Legacy
John Carter
Safe House
The Expendables 2
Get the Gringo
The Amazing Spider-Man
Jack Reacher*
The Hunger Games
Dredd
The Raid: Redemption
End of Watch
Looper
Skyfall
The Avengers
The Dark Knight Rises
Django Unchained*
Madea's Witness Protection
Fun Size
The Three Stooges
One For The Money
That's My Boy
Mirror Mirror
Parental Guidance
Wanderlust
A Thousand Words
For a Good Time, Call...
Damsels in Distress
Think Like a Man

Budget ($)in Millions
57
20
35
65
65
20
150
2
23
209
20
65
170
12
25
45
33
35
125
250
85
100
20
230
60
78
45
1.1
7
30
200
220
250
83
20
14
30
40
70
85
6.5
35
40
5.7
3
12

Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Comedy
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama
Drama

Diary of a Wimpy Kid: Dog Days
Iron Sky
Friends with Kids
Magic Mike
The Campaign
The Five-Year Engagement
Dark Shadows
To Rome with Love
This Is 40*
The Dictator
Celeste and Jesse Forever
Jeff, Who Lives at Home
Project X
Your Sister's Sister
Seeking a Friend for the End of the World
American Reunion
Men in Black 3
Safety Not Guaranteed
The Best Exotic Marigold Hotel
21 Jump Street
Ted
Seven Psychopaths
Moonrise Kingdom
Good Deeds
Darling Companion
Won't Back Down
Cosmopolis
W.E.
Big Miracle
Deadfall
The Odd Life of Timothy Green
Compliance
Arbitrage
The Words
Salmon Fishing in the Yemen
Smashed
People Like Us
Anna Karenina
Hitchcock*
Beasts of the Southern Wild
We Need to Talk About Kevin
Flight
The Impossible
The Master
Silver Linings Playbook
Argo
The Perks of Being a Wallflower

22
7
10
7
56
30
150
24.8
35
65
8
10
12
0.125
10
50
215
0.75
10
42
65
15
16
14
12
19
20
29
30
12
25
10
13
6
14.5
5
16
50
15
1.8
7
31
45
35
21
44.5
13

Drama
Drama
Sci-Fi
Sci-Fi
Sci-Fi
Sci-Fi

Lincoln
Life of Pi
Total Recall
Chronicle
Prometheus
Cloud Atlas

60
120
125
15
130
102

*Movies released in recent week.
Q1(i)
Ans

Is it a good idea to make a bigger budget movie for profit?
Not necessarily, the correlation, hypothesis test and regression model indicate that the budg

(ii)
Ans

Why does the length of the movie affect the budget?
The length of the movie affect the budget,because eg Sci-Fi and Action movies require's spe

(iii)
Ans

According to the data of movies released during the year of 2012 shows that there is no correlatio

Q2)(a) Ans
(b)

What is the correlation between viewer rating and gross collection?

Sources-www.Imdb.com,www.wikipedia.com,www.boxofficemojo.com
Unit of measurement -Gross collection,Weekend collection and Gross collection are measured in
length is measured in Minutes and Viewer rating are counted out of 10.

(c)

Means
Variance
Mode
Standard Deviation

Q3)

Correlation and Equality of Means

Budget ($)in Millions
First week collection($) in Mi
Gross ($) in Millions
Length in minutes
Viewer Rating

51.49267677
3653.788039
20
60.44657177

Budget ($)in Millions
1
0.669285261
0.790045357
0.566813015
0.233063936

From the above table we can see that the movies released during the year 2012,
gross collection of the movies.we can see from the correlation matrix above that the co
weekend collection while considering it's impact on gross collection i.e .(0.937311029) the high

H0: the first week collection has no impact on gross collection for all the movies released during
Mathematically H0 : µ first weekend collection - µ gross collection = 0

Mathematically H0 : µ first weekend collection

-

µ gross collection = 0

H1: Gross collection of all the movies released during the yr 2012 is directly proportional to the
Mathematically H1 : µ first weekend collection - µ gross collection ≠ 0

Let us consider α =.05 to establish this hypothesis test , to establish same we will use two tail te
of two sample assuming unequal variance with hypothesized mean difference as zero.

t-Test: Two-Sample Assuming Unequal Variances
First weekend collection($) in Mi
Mean
19.86907785
Variance
1048.60121
Observations
99
Hypothesized Mean Difference
0
df
101
t Stat
-5.000790229
P(T<=t) one-tail
1.20597E-06
t Critical one-tail
1.66008063
P(T<=t) two-tail
2.41194E-06
t Critical two-tail
1.983731003

As we can see that t stat is less than t critical two tail hence we can reject H0 and conclude that H
collection of all the movies released during the yr 2012 is directly proportional to the weekend co

Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA

0.937311029
0.878551965
0.877299923
85.13958791
99
df

Regression
Residual
Total
Intercept
First weekend collection($) in Mi
RESIDUAL OUTPUT
Observation

1
97
98
Coefficients
3.322237343
7.035381893
Predicted Gross ($) in Millions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

158.9120577
8.951893652
4.610591653
151.432694
103.7640489
58.9790028
238.706332
44.83099051
62.59793289
182.9694828
47.1655835
125.7796021
398.835226
175.5246909
174.6324849
351.6775466
58.84926332
47.64514327
271.6715777
215.6513855
285.9526642
204.4734442
5.643040981
439.5488966
113.0741949
1076.46947
47.49381924
4.826296462
95.85638517
149.6690996
625.0017462
1462.732768
1135.225799
219.2240369
181.954629
32.17445809
122.9949628
84.3402178
97.97425322
130.8883798
107.4458894
49.23971258
46.77472582
4.334875036
3.734433333
239.9664744
106.204841

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

3.5332988
17.51588111
278.5968207
190.3822074
77.96806136
212.1694765
5.864535909
84.78615548
125.9847679
4.080545981
9.342476948
151.4266154
84.78615548
30.21711635
154.6820062
387.4032862
4.010030348
8.507672603
258.7249765
386.1539853
32.69435874
7.001713932
112.9610941
3.603385275
21.6379395
3.81709907
3.653420911
57.91824309
3.458660434
79.46549314
3.437807562
17.40823273
36.74659097
4.911487901
3.511791638
33.26076327
5.578413963
5.346422245
4.516155722
3.495216278
178.5072285
6.136390101
8.502466421
6.438932628
140.2174651
4.928830117
9.965804749

95
96
97
98
99

161.2772124
183.2715329
158.12947
362.4791936
70.94806584

First weekend collection($) in
Mi Residual Plot
Residuals

500
0
0.00
-500

50.00

100.00

150.00

200.00

250.00

First weekend collection($) in Mi

CONCLUSION

As per the analysis of data abovefor movies released during the year 2012 indicates that gross c
collections. In action movies viewer rating and length were depended on the Gross collection bu
accurate of viewer rating and length are correlated . Comedy movies data displayed they were d
explained by regression test, it slated that variance of 70.1% of population of gross collection ca
collection thus regression model is very accurate and hence supporting our test
weekend collection and the variance of 73.4% of population of gross collection can be explained
very accurate and hence supporting our test. in our data the Sci-Fi movies released during the ye
genre. In Sci-Fi movies budget and length were co related as budget were higher for the sci
Budget can be explained by variance of length of the movie thus regression model is not very a

PROJECT BY: SAYED ANWAR & HETAL KHATRI
MIB 2012 (SEPT)

First weekend collection($) in Mi Gross ($) in Millions
22.12
132.5
0.80
16.8
0.18
2.5
21.05
221.6
14.28
39
7.91
18.4
33.46
301.9
5.90
14.4
8.43
33.3
25.53
302
6.23
28
17.41
156.3
56.22
396.3
24.48
80.4
24.35
96.2
49.51
365
7.89
40.3
6.30
30.6
38.14
276
30.18
282.7
40.17
207.8
28.59
312.5
0.33
7.5
62.00
752
15.60
110
152.54
686.6
6.28
36.2
0.21
4.1
13.15
40
20.80
166
88.36
978
207.44
1550.1
160.89
1081
30.69
150
25.39
65.6
4.10
9.2
17.01
53
11.52
36.8
13.45
57.7
18.13
162.8
14.80
29.3
6.53
21.4
6.18
20.5
0.14
1.2
0.06
1.3
33.64
99.19

Length in minutes
95
93
96
95
114
96
99
100
93
131
95
97
127
110
110
91
95
91
135
132
115
103
96
136
130
142
95
101
109
118
143
143
165
165
114
90
92
91
114
106
105
98
91
85
99
123

Viewer Rating
4.4
4.8
5.3
5.3
5.5
5.8
5.8
5.9
5.9
6
6.1
6.3
6.3
6.4
6.4
6.4
6.5
6.6
6.7
6.7
6.8
7
7.1
7.2
7.3
7.3
7.4
7.6
7.7
7.8
8
8.4
8.7
8.8
3.9
5
5.1
5.1
5.5
5.5
5.6
5.6
5.6
5.7
6
6

14.62

0.03
2.02
39.13
26.59
10.61
29.69
0.36

11.58
17.44
0.11
0.86
21.05

11.58
3.82
21.51
54.59
0.10
0.74
36.30
54.42
4.17
0.52
15.58
0.04
2.60
0.07
0.05
7.76
0.02
10.82
0.02
2.00
4.75
0.23
0.03
4.26
0.32
0.29
0.17
0.02
24.90

0.40
0.74
0.44
19.46
0.23

76.5
8
12
165
103
53.7
238.7
73
20.7
177.5
26
4.5
101
1.1
9.6
234.7
624
4
134
202
501.7
15.1
65
35
7.9
5.2
6.5
0.89
24
0.45
51.6
31
23
11.4
34.5
2.9
12.4
27

4.5
11
6
95.5
60.3
18.8
32
159.6
28

94
93
100
110
85
124
150
112
133
83
92
83
88
90
101
113
106
86
124
109
106
110
94
111
103
121
109
119
107
95
104
90
100
96
107
81
114
130
98
93
112
139
113
143
122
120
102

6
6.1
6.1
6.2
6.2
6.3
6.3
6.4
6.5
6.5
6.6
6.6
6.6
6.7
6.7
6.9
6.9
7.1
7.2
7.2
7.3
7.8
7.9
4.3
4.6
4.9
5.3
5.4
6.3
6.4
6.5
6.7
6.7
6.8
6.8
7
7.1
7.1
7.3
7.5
7.5
7.5
7.7
7.8
8.2
8.2
8.3

0.94
22.45
25.58

122.2
240
198
126
402.52
65.6

22.00
51.05
9.61

150
127
118
83
124
171

8.3
8.3
6.3
7.1
7.2
8.1

on model indicate that the budget of the movie is depended on the weekend, gross and viewer rating.

and Action movies require's special effect which creates an excitement for viewer.

2 shows that there is no correlation between viewer rating and gross collection.

ojo.com
d Gross collection are measured in $ Millions.
of 10.
19.86907785
1048.60121
11.579175
32.3821125

143.1087879
59076.97557
28
243.0575561

First weekend collection($) in Mi Gross ($) in Millions
1
0.937311029
0.468210392
0.277329405

1
0.473707897
0.344214701

ring the year 2012, the weekend collection has a big impact on the
n matrix above that the co-efficeient of correlation is highest for
lection i.e .(0.937311029) the higher the weekend collection the

n for all the movies released during the year 2012.
collection = 0

109.6161616
383.1981035
95
19.57544644

6.586868687
1.045029891
6.3
1.022267035

Length in minutes

Viewer Rating

1
0.440210241

1

collection = 0

012 is directly proportional to the weekend collection.
collection ≠ 0

ablish same we will use two tail test by using t-statistic using t test
mean difference as zero.

Gross ($) in Millions
143.1087879
59076.97557
99

can reject H0 and conclude that H1 is true which states that Gross
tly proportional to the weekend collection.

SS
5086414.911
703128.6946
5789543.605
Standard Error
10.05320477
0.265590984
Residuals

MS
5086414.911
7248.749429
t Stat
0.3304655
26.48953582

F
Significance F
701.6955077
3.37199E-46

P-value
0.741760972
3.37199E-46

Lower 95% Upper 95%
-16.63059126 23.27507
6.508257309 7.562506

-26.41205774
7.848106348
-2.110591653
70.167306
-64.76404889
-40.5790028
63.19366799
-30.43099051
-29.29793289
119.0305172
-19.1655835
30.5203979
-2.535226016
-95.12469093
-78.4324849
13.32245337
-18.54926332
-17.04514327
4.328422285
67.04861446
-78.15266424
108.0265558
1.856959019
312.4511034
-3.074194882
-389.8694699
-11.29381924
-0.726296462
-55.85638517
16.33090036
352.9982538
87.36723239
-54.22579948
-69.22403689
-116.354629
-22.97445809
-69.99496277
-47.5402178
-40.27425322
31.91162016
-78.14588937
-27.83971258
-26.27472582
-3.134875036
-2.434433333
-140.7764744
-29.70484097

4.4667012
-5.515881111
-113.5968207
-87.3822074
-24.26806136
26.53052345
67.13546409
-64.08615548
51.51523209
21.91945402
-4.842476948
-50.42661543
-83.68615548
-20.61711635
80.01799377
236.5967138
-0.010030348
125.4923274
-57.14497649
115.5460147
-17.59435874
57.99828607
-77.96109408
4.296614725
-16.4379395
2.68290093
-2.763420911
-33.91824309
-3.008660434
-27.86549314
27.56219244
5.591767268
-25.34659097
29.5885121
-0.611791638
-20.86076327
21.06158604
-0.846422245
6.483844278
2.504783722
-83.00722852
54.1636099
10.29753358
25.56106737
19.38253492
23.07116988
112.2341953

78.72278758
14.72846715
-32.12946999
40.04080642
-5.348065843

To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and first weekend collection as causal variable hence we
plot Gross collectiona t Y axis and First weekend collection at X axis. As per the table
resulted by regression we can see that R square is 87.8 % which indicates that variance of
87.8% of population of gross collection can be explained by variance of first weekend
collection thus regression model is very accurate and hence supporting our hypothesis test.
The significance quotient is only 3.3% which clearly indicates that probablity of regression
obtained above by chance is only 3.3% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.74 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of first weekend
collection holds true.

he year 2012 indicates that gross collections were dependent on first weekend
pended on the Gross collection but in Regression test we found out that only 45% are
movies data displayed they were directly coeffecient with gross collection which is
of population of gross collection can be explained by variance of first weekend
upporting our test. Drama movies were dependent on the budget and has relation with
of gross collection can be explained by variance of Budget, thus regression model is
Fi movies released during the year 2012 have small percentage compare to other
udget were higher for the sci-fi movies and that variance of only 34%of population of
hus regression model is not very accurate and hence partially supporting our test.

Lower 95.0%
Upper 95.0%
-16.6306 23.27507
6.508257 7.562506

Title
Ghost Rider: Spirit of Vengeance
The Cold Light of Day
Stolen
Resident Evil: Retribution
Red Dawn
The Man with the Iron Fists
Wrath of the Titans
Hit and Run
Haywire
Battleship
Lockout
This Means War
Snow White and the Huntsman
Act of Valor
Contraband
Taken 2
Safe
Premium Rush
The Bourne Legacy
John Carter
Safe House
The Expendables 2
Get the Gringo
The Amazing Spider-Man
Jack Reacher*
The Hunger Games
Dredd
The Raid: Redemption
End of Watch
Looper
Skyfall
The Avengers
The Dark Knight Rises
Django Unchained*

Budget ($)in Millions First weekend collection($) in Mi
57.00
22.12
20.00
0.80
35.00
0.18
65.00
21.05
65.00
14.28
20.00
7.91
150.00
33.46
2.00
5.90
23.00
8.43
209.00
25.53
20.00
6.23
65.00
17.41
170.00
56.22
12.00
24.48
25.00
24.35
45.00
49.51
33.00
7.89
35.00
6.30
125.00
38.14
250.00
30.18
85.00
40.17
100.00
28.59
20.00
0.33
230.00
62.00
60.00
15.60
78.00
152.54
45.00
6.28
1.10
0.21
7.00
13.15
30.00
20.80
200.00
88.36
220.00
207.44
250.00
160.89
83.00
30.69

Correlation and Equality of Means
Budget ($)in Millions
First week collection($) in Mi
Gross ($) in Millions
Length in minutes
Viewer Rating

Budget ($)in Millions
1
0.63709825
0.767998296
0.702548668
0.336751297

As the co-efficient of correlation is highest Length of action movies are rela
.The more action scence in the movie the more the viewer rating.

As the co-efficient of correlation is highest Length of action movies are rela
.The more action scence in the movie the more the viewer rating.

H0: length of the movie has no impact on viewer rating for all Action mov
year 2012.
Mathematically H0 : µ length of movies - µ viewer rating = 0

H1: Viewer rating of all the action movies released during the yr 2012 is di
the length of the movie.
Mathematically H1 : µ first weekend collection - µ gross collection ≠

Let us consider α =.05 to establish this hypothesis test , to establish same w
by using t-statistic using t test of two sample assuming unequal variance w

t-Test: Two-Sample Assuming Unequal Variances
Viewer Rating
Mean
6.652941176
Variance
1.100142602
Observations
34
Hypothesized Mean Difference
0
df
33
t Stat
-28.77296391
P(T<=t) one-tail
2.98714E-25
t Critical one-tail
1.692360309
P(T<=t) two-tail
5.97428E-25
t Critical two-tail
2.034515297

As we can see that t stat is less than t critical two tail hence we can reject H
true Viewer rating of all the action movies released during the yr 2012 is di
length of the movie , as evident the action movies have special effects whic
length of movoe and hence the good viewer rating .

Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square

0.676707073
0.457932462

Adjusted R Square
Standard Error
Observations

0.440992852
16.15683708
34

ANOVA
df
Regression
Residual
Total

1
32
33
Coefficients
20.65671279
13.94196183

Intercept
Viewer Rating
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Predicted Length in minutes
82.00134483
87.57812956
94.54911047
94.54911047
97.33750284
101.5200914
101.5200914
102.9142876
102.9142876
104.3084837
105.7026799
108.4910723
108.4910723
109.8852685
109.8852685
109.8852685
111.2794647
112.6736608
114.067857
114.067857
115.4620532
118.2504456
119.6446418
121.0388379
122.4330341
122.4330341
123.8272303
126.6156227
128.0098189
129.404015
132.1924074

32
33
34

137.7691921
141.9517807
143.3459769

30
20
10
0
-10 0.00
-20
-30
-40

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

Gross ($) in Millions
132.50
16.80
2.50
221.60
39.00
18.40
301.90
14.40
33.30
302.00
28.00
156.30
396.30
80.40
96.20
365.00
40.30
30.60
276.00
282.70
207.80
312.50
7.50
752.00
110.00
686.60
36.20
4.10
40.00
166.00
978.00
1550.10
1081.00
150.00

Length in minutes
95.00
93.00
96.00
95.00
114.00
96.00
99.00
100.00
93.00
131.00
95.00
97.00
127.00
110.00
110.00
91.00
95.00
91.00
135.00
132.00
115.00
103.00
96.00
136.00
130.00
142.00
95.00
101.00
109.00
118.00
143.00
143.00
165.00
165.00

Viewer Rating
4.40
4.80
5.30
5.30
5.50
5.80
5.80
5.90
5.90
6.00
6.10
6.30
6.30
6.40
6.40
6.40
6.50
6.60
6.70
6.70
6.80
7.00
7.10
7.20
7.30
7.30
7.40
7.60
7.70
7.80
8.00
8.40
8.70
8.80

First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.942919066
0.678489345
0.519736102

1
0.669238903
0.51819843

ngth of action movies are related to the viewer rating
e the viewer rating.

1
0.676707073

1

ngth of action movies are related to the viewer rating
e the viewer rating.

ewer rating for all Action movies released during the
µ viewer rating = 0

eased during the yr 2012 is directly proportional to
µ gross collection ≠ 0

hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean

Length in minutes
113.4117647
466.9768271
34

two tail hence we can reject H0 and conclude that H1 is
leased during the yr 2012 is directly proportional to the
ovies have special effects which will be more as per

SS
7056.846996
8353.388298
15410.23529

MS
7056.846996
261.0433843

Standard Error
18.05364614
2.681481989

t Stat
1.144185093
5.19934942

Residuals
12.99865517
5.421870443
1.45088953
0.45088953
16.66249716
-5.520091383
-2.520091383
-2.914287566
-9.914287566
26.69151625
-10.70267993
-11.4910723
18.5089277
0.114731521
0.114731521
-18.88526848
-16.27946466
-21.67366084
20.93214297
17.93214297
-0.46205321
-15.25044558
-23.64464176
14.96116206
7.566965877
19.56696588
-28.82723031
-25.61562267
-19.00981885
-11.40401504
10.8075926

F
Significance F
27.0332344
1.1127E-05

P-value
0.261033735
1.1127E-05

Lower 95%
-16.11736101
8.479961753

Upper 95%
Lower 95.0%
57.43078659 -16.11736101
19.4039619
8.479961753

5.230807868
23.04821932
21.65402314

10.00

To establish the relation mentioned above we use regression analysis by assuming viewer
rating as dependent variable and length of the movie as causal variable hence we plot
viewer rating at Y axis and length of the movie at X axis. As per the table resulted by
regression we can see that R square is only45 % which indicates that variance of only
45%of population of viewer rating can be explained by variance of length of the movie thus
regression model is not very accurate and hence partially supporting our hypothesis test.
The significance quotient is only 1.1% which clearly indicates that probablity of regression
obtained above by chance is only 1.1% and hence this model can be considered partially
significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.26 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie
partially holds true.

Upper 95.0%
57.43078659
19.4039619

Action

Title
Madea's Witness Protection
Fun Size
The Three Stooges
One For The Money
That's My Boy
Mirror Mirror
Parental Guidance
Wanderlust
A Thousand Words
For a Good Time, Call...
Damsels in Distress
Think Like a Man
Diary of a Wimpy Kid: Dog Days
Iron Sky
Friends with Kids
Magic Mike
The Campaign
The Five-Year Engagement
Dark Shadows
To Rome with Love
This Is 40*
The Dictator
Celeste and Jesse Forever
Jeff, Who Lives at Home
Project X
Your Sister's Sister
Seeking a Friend for the End of the World
American Reunion
Men in Black 3
Safety Not Guaranteed
The Best Exotic Marigold Hotel
21 Jump Street
Ted
Seven Psychopaths

Budget ($)in Millions First weekend collection($) in Mi
20
25.390575
14
4.101017
30
17.010125
40
11.51579
70
13.453714
85
18.132085
6.5
14.8
35
6.52665
40
6.17628
5.7
0.143935
3
0.058589
12
33.636303
22
14.623599
7
0.03
10
2.017466
7
39.12717
56
26.58846
30
10.61006
150
29.685274
24.8
0.361359
35
11.579175
65
17.435092
8
0.107785
10
0.855709
12
21.051363
0.125
11.579175
10
3.822803
50
21.51408
215
54.592779
0.75
0.097762
10
0.737051
42
36.302612
65
54.415205
15
4.174915

Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Million 0.613835983
Gross ($) in Millions
0.782226864
Length in minutes
0.334702147
Viewer Rating
0.082634184

In comedy movies the weekend collection matters. The weekend collection

In comedy movies the weekend collection matters. The weekend collection
impact on the gross collection as the coefficient of correlation is highest . T
have relation with length of the movie.The viewer rating does not matter b
it's own taste in understanding comedy.

H0: the first week collection has no impact on gross collection for all the c
during the year 2012.
Mathematically H0 : µ weekend collection - µ gross collection = 0

H1: Gross collection of all the comedy movies released during the yr 2012
to the weekend collection.
Mathematically H1 : µ weekend collection - µ gross collection ≠ 0

Let us consider α =.05 to establish this hypothesis test , to establish same w
by using t-statistic using t test of two sample assuming unequal variance w

t-Test: Two-Sample Assuming Unequal Variances
First weekend collection($) in Mi
Mean
15.06629285
Variance
228.0955534
Observations
34
Hypothesized Mean Difference
0
df
34
t Stat
-3.498155532
P(T<=t) one-tail
0.000663816
t Critical one-tail
1.690924255
P(T<=t) two-tail
0.001327631
t Critical two-tail
2.032244509

As we can see that t stat is less than t critical two tail hence we can reject H
true which states that Gross collection of all the comedymovies released d
directly proportional to the weekend collection.

Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square

0.841320894
0.707820847

Adjusted R Square
Standard Error
Observations
ANOVA

0.698690248
75.73546297
34
df

Regression
Residual
Total

1
32
33
Coefficients
-17.46393553
7.685921708

Intercept
First weekend collection($) in Mi
RESIDUAL OUTPUT
Observation
Predicted Gross ($) in Millions
1
177.686036
2
14.05616006
3
113.2745535
4
71.04552481
5
85.94025695
6
121.8978502
7
96.28770575
8
32.69938539
9
30.006469
10
-16.35766239
11
-17.01362506
12
241.0620559
13
94.93190147
14
-17.23335788
15
-1.957849803
16
283.2644297
17
186.8928864
18
64.08415495
19
210.6947563
20
-14.68655854
21
71.53269696
22
116.5408166
23
-16.63550846
24
-10.88702315
25
144.3351923
26
71.53269696
27
11.91782903
28
147.891599
29
402.1318897
30
-16.71254445
31
-11.79901925
32
261.5550981

33
34

400.7670698
14.6241343

First weekend collection($) in Mi Residual Plot
Residuals

400
200
0
0
-200

10

20

30

40

First weekend collection($) in Mi

50

60

Gross ($) in Millions
65.6
9.2
53
36.8
57.7
162.8
29.3
21.4
20.5
1.2
1.3
99.19
76.5
8
12
165
103
53.7
238.7
73
20.7
177.5
26
4.5
101
1.1
9.6
234.7
624
4
134
201.58
501.7
15.1

Length in minutes
114
90
92
91
114
106
105
98
91
85
99
123
94
93
100
110
85
124
150
112
133
83
92
83
88
90
101
113
106
86
124
109
106
110

Viewer Rating
3.9
5
5.1
5.1
5.5
5.5
5.6
5.6
5.6
5.7
6
6
6
6.1
6.1
6.2
6.2
6.3
6.3
6.4
6.5
6.5
6.6
6.6
6.6
6.7
6.7
6.9
6.9
7.1
7.2
7.2
7.3
7.8

First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.841320894
0.29738332
0.109791556

1
0.267165364
0.330282543

atters. The weekend collection will have a strong

1
0.110400891

1

atters. The weekend collection will have a strong
ent of correlation is highest . The budget does not
ewer rating does not matter because every person has

n gross collection for all the comedy movies released
µ gross collection = 0

es released during the yr 2012 is directly proportional
µ gross collection ≠ 0

hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean

Gross ($) in Millions
98.33441176
19036.42455
34

two tail hence we can reject H0 and conclude that H1 is
the comedymovies released during the yr 2012 is

SS
444654.479
183547.5313
628202.0102

MS
444654.479
5735.860352

Standard Error
18.48447165
0.872939017

t Stat
-0.944789543
8.804649074

Residuals
-112.086036
-4.856160057
-60.27455346
-34.24552481
-28.24025695
40.90214982
-66.98770575
-11.29938539
-9.506468997
17.55766239
18.31362506
-141.8720559
-18.43190147
25.23335788
13.9578498
-118.2644297
-83.89288636
-10.38415495
28.00524369
87.68655854
-50.83269696
60.95918345
42.63550846
15.38702315
-43.33519233
-70.43269696
-2.317829035
86.80840104
221.8681103
20.71254445
145.7990192
-59.97509809

F
Significance F
77.52184532
4.63733E-10

P-value
0.351845885
4.63733E-10

Lower 95%
-55.11557216
5.907803117

Upper 95%
Lower 95.0%
20.1877011 -55.11557216
9.464040298
5.907803117

100.9329302
0.475865701

To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and first weekend collection as causal variable hence we
plot Gross collectiona t Y axis and First weekend collection at X axis. As per the table
resulted by regression we can see that R square is 70.1 % which indicates that variance of
70.1% of population of gross collection can be explained by variance of first weekend
collection thus regression model is very accurate and hence supporting our hypothesis test.
The significance quotient is only 4.6% which clearly indicates that probablity of regression
obtained above by chance is only 4.6% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.35 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of first weekend
collection holds true.

Upper 95.0%
20.1877011
9.464040298

Comedy

Title
Good Deeds
Darling Companion
Won't Back Down
Cosmopolis
W.E.
Big Miracle
Deadfall
The Odd Life of Timothy Green
Compliance
Arbitrage
The Words
Salmon Fishing in the Yemen
Smashed
People Like Us
Anna Karenina
Hitchcock*
Beasts of the Southern Wild
We Need to Talk About Kevin
Flight
The Impossible
The Master
Silver Linings Playbook
Argo
The Perks of Being a Wallflower
Lincoln
Life of Pi

Budget ($)in Millions First weekend collection($) in Mi
14
15.583924
12
0.039962
19
2.60337
20
0.070339
29
0.047074
30
7.760205
12
0.019391
25
10.822903
10
0.016427
13
2.002165
6
4.750894
14.5
0.225894
5
0.026943
16
4.255423
50
0.32069
15
0.287715
1.8
0.169702
7
0.024587
31
24.900566
45
0.4
35
0.736311
21
0.443003
44.5
19.458109
13
0.228359
60
0.944308
120
22.451514

Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Mi
0.502686058
Gross ($) in Millions
0.85712907
Length in minutes
0.616787474
Viewer Rating
0.384348525

In Drama movies the Gross collection is dependent on the budget of the mo
correlation is highest.

H0: The gross collection for all the drama movies released during the yea
budget of the movie.

H0: The gross collection for all the drama movies released during the yea
budget of the movie.
Mathematically H1 : µ budget - µ gross collection = 0

H1: Gross collection of all the drama movies released during the yr 2012 is
to the budget of the movie.
Mathematically H1 : µ budget - µ gross collection ≠ 0

Let us consider α =.05 to establish this hypothesis test , to establish same w
by using t-statistic using t test of two sample assuming unequal variance w
difference as zero.

t-Test: Two-Sample Assuming Unequal Variances
Budget ($)in Millions
Mean
25.72307692
Variance
591.2858462
Observations
26
Hypothesized Mean Difference
0
df
34
t Stat
-1.225549156
P(T<=t) one-tail
0.114395125
t Critical one-tail
1.690924255
P(T<=t) two-tail
0.22879025
t Critical two-tail
2.032244509

As we can see that t stat is less than t critical two tail hence we can reject H
is true which states that Gross collection of all the drama movies released
directly proportional to the budget of the movie.

Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA

0.85712907
0.734670243
0.723614836
29.52882724
26
df

Regression
Residual
Total

1
24
25

Coefficients
-10.49446037
1.979868376

Intercept
Budget ($)in Millions
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Predicted Gross ($) in Millions
17.22369689
13.26396014
27.12303877
29.10290714
46.92172252
48.9015909
13.26396014
39.00224902
9.304223388
15.24382851
1.384749886
18.21363108
-0.59511849
21.18343364
88.49895841
19.20356527
-6.930697291
3.364618261
50.88145927
78.59961653
58.80093278
31.08277552
77.60968234
15.24382851
108.2976422
227.0897447

Budget ($)in Millions Residual Plot
Residuals

100
50
0
-50
-100

0

20

40

60

80

Budget ($)in Millions

100

120

140

Gross ($) in Millions
35
7.9
5.2
6.5
0.89
24
0.45
51.6
31
23
11.4
34.5
2.9
12.4
26.64
4.5
11
6
95.5
60.3
18.8
32
159.6
28
122.2
240

Length in minutes
111
103
121
109
119
107
95
104
90
100
96
107
81
114
130
98
93
112
139
113
143
122
120
102
150
127

Viewer Rating
4.3
4.6
4.9
5.3
5.4
6.3
6.4
6.5
6.7
6.7
6.8
6.8
7
7.1
7.1
7.3
7.5
7.5
7.5
7.7
7.8
8.2
8.2
8.3
8.3
8.3

First weekend collection($) inGross
Mi ($) in Millions Length in minutes Viewer Rating
1
0.726299275
0.328596164
0.110079505

1
0.479212016
0.485727023

dent on the budget of the movie as the coefficient of

movies released during the year 2012 is independent of

1
0.279065717

1

movies released during the year 2012 is independent of

s collection = 0
released during the yr 2012 is directly proportional

s collection ≠ 0

hesis test , to establish same we will use two tail test
assuming unequal variance with hypothesized mean

Gross ($) in Millions
40.43384615
3154.842417
26

wo tail hence we can reject H0 and conclude that H1
l the drama movies released during the yr 2012 is

SS
57944.2211
20926.83932
78871.06042

MS
57944.2211
871.9516383

F
Significance F
66.45348039
2.25863E-08

Standard Error
8.518614798
0.242872002

t Stat
-1.231944467
8.151900416

P-value
0.229903996
2.25863E-08

Lower 95%
-28.0760172
1.4786052

Upper 95%
Lower 95.0%
7.087096462
-28.0760172
2.481131551
1.4786052

Residuals
17.77630311
-5.363960139
-21.92303877
-22.60290714
-46.03172252
-24.9015909
-12.81396014
12.59775098
21.69577661
7.756171485
10.01525011
16.28636892
3.49511849
-8.783433641
-61.85895841
-14.70356527
17.93069729
2.635381739
44.61854073
-18.29961653
-40.00093278
0.917224481
81.99031766
12.75617149
13.90235784
12.9102553

To establish the relation mentioned above we use regression analysis by assuming Gross
collection as dependent variable and Budget as causal variable hence we plot Gross
collectiona t Y axis and Budget at X axis. As per the table resulted by regression we can see
that R square is 73.4 % which indicates that variance of 73.4% of population of gross
collection can be explained by variance of Budget, thus regression model is very accurate
and hence supporting our hypothesis test.
The significance quotient is only 2.26% which clearly indicates that probablity of regression
obtained above by chance is only 2.26% and hence this model can be considered accurate
and significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot as
the Pvalue is very low only at 0.23 hence again the probablity of such regression obtained
by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and

The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that Gross collection is a dependent variable of Budget holds true.

Upper 95.0%
7.087096462
2.481131551

Drama

Title
Total Recall
Chronicle
Prometheus
Cloud Atlas

Budget ($)in MillionsFirst weekend collection($) in Mi
125
25.577758
15
22.004098
130
51.050101
102
9.612247

Correlation and Equality of Means
Budget ($)in Millions
Budget ($)in Millions
1
First week collection($) in Mi 0.386608264
Gross ($) in Millions
0.510235419
Length in minutes
0.591570074
Viewer Rating
-0.109305766

In Sci-Fi movies, the length and budget of movie is correlated. Public always
genre movies, this is so due to the special effect in the sci
of high budget and more money invested on sci fi effects leads to high budg

H0: Budget of all Sci-Fi movies released during the year 2012 is independe
Mathematically H1 : µ length - µ budget = 0

H1:Budget of all Sci-Fi movies released during the year 2012 directly prop
Mathematically H1 : µ length - µ budget ≠ 0

Let us consider α =.05 to establish this hypothesis test , to establish same w
two sample assuming unequal variance with hypothesized mean difference

t-Test: Two-Sample Assuming Unequal Variances
Budget ($)in Millions
Mean
93
Variance
2852.666667
Observations
4
Hypothesized Mean Difference
0
df
5
t Stat
-0.961115181
P(T<=t) one-tail
0.190317704
t Critical one-tail
2.015048373
P(T<=t) two-tail
0.380635408
t Critical two-tail
2.570581836

As we can see that t stat is less than t critical two tail hence we can reject H
is true which states that Budget of all Sci-Fi movies released during the ye

is true which states that Budget of all Sci-Fi movies released during the ye
proportional of the length of the movie

Regression Model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA

0.591570074
0.349955153
0.02493273
52.74032518
4
df

Regression
Residual
Total
Intercept
Length in minutes
RESIDUAL OUTPUT
Observation

1
2
3
Coefficients
-15.30259806
0.873408049
Predicted Budget ($)in Millions
1
87.75955171
2
57.19026999
3
93
4
134.0501783

Length in minutes Residual Plot
Residuals

50

0
0
-50

20

40

60

80

100

Length in minutes

120

140

160

Gross ($) in Millions
198
126
402.52
65.6

Length in minutes
118
83
124
171

Viewer Rating
6.3
7.1
7.2
8.1

First weekend collection($) in MiGross ($) in Millions Length in minutes Viewer Rating
1
0.990390527
-0.319880107
-0.360688156

1
-0.205474061
-0.34543705

1
0.648028511

1

of movie is correlated. Public always want Sci-Fi movies length to be longer compare to other
l effect in the sci-fi movies which creates an excitement for the public. Sci-fi movies are made
d on sci fi effects leads to high budget.

d during the year 2012 is independent of the length of the movie.

during the year 2012 directly proportional of the length of the movie.
budget ≠ 0

hypothesis test , to establish same we will use two tail test by using t-statistic using t test of
with hypothesized mean difference as zero.

Length in minutes
124
1308.666667
4

tical two tail hence we can reject H0 and conclude that H1
Fi movies released during the year 2012 directly

Fi movies released during the year 2012 directly

SS
2994.9162
5563.0838
8558
Standard Error
107.6529958
0.841720018

MS
2994.9162
2781.5419
t Stat
-0.142147443
1.03764676

F
Significance F
1.076710798
0.408429926

P-value
0.899990505
0.408429926

Lower 95%
-478.4960544
-2.748220882

Upper 95%
447.8908582
4.49503698

Residuals
37.24044829
-42.19026999
37
-32.0501783

160

180

To establish the relation mentioned above we use regression analysis by assuming Budget
as dependent variable and length of the movie as causal variable hence we plot Budget
Y axis and length of the movie at X axis. As per the table resulted by regression we can see
that R square is only 34% which indicates that variance of only 34%of population of
Budget can be explained by variance of length of the movie thus regression model is not
very accurate and hence partially supporting our hypothesis test.
The significance quotient is only 0.4% which clearly indicates that probablity of regression
obtained above by chance is only 0.4% and hence this model can be considered partially
significant again supporting our hypothesis.
Another observation from regression model can be inferred from p values of Y interceot
as the P value is very low only at 0.89hence again the probablity of such regression
obtained by chance is very low .
The fourth and most importnat inference can be interpreted from the residual effects
graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie

graph not following a specific pattern and hence regression can be considered robust and
hence the hypothesis that viewer collection is a dependent variable of length of the movie
partially holds true.
***less number of dataset population has resulted in differen results and hence any
interpretation drawn form such a small pool of data is absurd****

Lower 95.0%
Upper 95.0%
-478.4960544
447.8908582
-2.748220882
4.49503698

by assuming Budget
Budget at
ression we can see
population of
ssion model is not

ablity of regression
nsidered partially

ues of Y interceot
h regression

residual effects
sidered robust and
ength of the movie

sidered robust and
ength of the movie

and hence any

Sci-Fi