You are on page 1of 100

BUSINESS RESEARCH METHODOLOGY LAB

(Using MS Excel and R Studio)

PRACTICAL FILE
Submitted for partial fulfillment for the award of the Degree
of

BACHELOR OF BUSINESS ADMINISTRATION


{BBA (G) 2022 – 2025}

Under the guidance of

Dr. AANCHAL AGGARWAL


Submitted by

“YASH AGGARWAL”
“08329801722”

VIVEKANANDA SCHOOL OF BUSINESS STUDIES


VIVEKANANDA INSTITUTE OF PROFESSIONAL STUDIES-TC

(Affiliated to Guru Gobind Singh Indraprastha University)

YASH AGGARWAL 08329801722 1


YASH AGGARWAL 08329801722 2
INDEX

TOPIC PAGE NO
 Descriptive statistics
 Histogram frequency distribution
 Correlation (Positive, Negative, zero)
HYPOTHESIS TESTING
 One sample t test using dummy (one-tail)
 One sample t test using dummy (two-tail)
 Two sample t test (one-tail)
 Two sample - t test (two tail)
 Paired Sample t test (one-tail)
 Paired Sample t test (two-tail)
 Two sample z test
 F test
 ANOVA – Single Factor
 ANOVA – Two Factor without replication
 ANOVA – Two Factor with replication
 Chi-square test
 Regression
HYPOTHESIS TESTING in R Studio
 How to install R Studio
 Introduction to R studio
 Import of Data Sheet in R studio
 Descriptive statistics
 Correlation
 Hypothesis Testing: One sample T test (one tail)
 Hypothesis Testing: Two sample T test (alpha=10%)
 Hypothesis Testing: Paired Sample T test
 Hypothesis Testing: F test
 Hypothesis Testing: One-way ANOVA

YASH AGGARWAL 08329801722 3


Descriptive Analysis
Step 1: Go to Data  Data Analysis  Descriptive Statistics

Step 2: Enter input range, tick on labels in first row if you have selected a heading, select an
output range and click on summary

YASH AGGARWAL 08329801722 4


Step 3: Click ok.

YASH AGGARWAL 08329801722 5


Histogram Analysis
Step1: Go to data tab  Data Analysis  Histogram

Step 2: Enter input range, bin range, tick labels, tick pareto, cumulative percentage, chart
output

YASH AGGARWAL 08329801722 6


Step 3: Enter output range and click ok.

YASH AGGARWAL 08329801722 7


Correlation:
The correlation coefficient (a value between -1 and +1) tells you how strongly two variables
are related to each other

a. Positive Correlation –

What is the correlation between advertisement of a product in a month and its sales in crores?
Sales in
Advertisement in month crores
32 5
54 10
67 15
65 20
98 24
112 34
101 25
34 34

Step 1:

YASH AGGARWAL 08329801722 8


Step2:

Step3:

Result:
Advertisement in month
Advertisement in month 1
Sales in crores 0.485149134 1

YASH AGGARWAL 08329801722 9


Inference:
Here r =+0.48, therefore there is a positive correlation between advertisements and sales.

b. Negative Correlation –
What is the correlation between no of cigarettes in a week and life expectancy?
Cigarette Life
s expectancy
5 80
23 78
25 60
48 53
17 85
8 84
4 73
26 79
11 81
19 75
14 68
35 72
29 58
4 92
23 65

1
YASH AGGARWAL 08329801722
0
Step 1:

Step 2:

1
YASH AGGARWAL 08329801722
1
Step 3:

Result:
Cigarettes Life expectancy
Cigarettes 1
-
0.7134301
Life expectancy 7 1

Inference
Here r = -0.71, therefore there is a negative correlation between number of cigarettes in a
week and life expectancy

c. No/Zero Correlation –
What is the correlation between shoe size and IQ level?
Shoe
size IQ level
1 4
2 5
3 4
4 5

1
YASH AGGARWAL 08329801722
2
5 4
6 5
7 4

Step 1:

Step 2:

1
YASH AGGARWAL 08329801722
3
Step 3:

Result:
Shoe
size IQ level
Shoe
size 1
IQ level 0 1

Inference
Here r = 0, therefore there is no or zero correlation between shoe size and IQ level

1
YASH AGGARWAL 08329801722
4
Hypothesis Testing
(i) T-Test
One Sample t-test using dummy (one tailed):
Problem: Suppose that we want to hypothesize that the mean number of TV hours watched
per week is greater than 28.5 at α=0.05
Hours Dummy
25.7 0
38.5 0
29.3 0
25.1
30.6
34.6
30
39
33.7
31.6
25.9
34.4
26.9
23
31.1
29.3
34.5
31.2
33.2
30.2
36.5
37.5
27.6
24.6
23.9
27
29.5
30
29.6

1
YASH AGGARWAL 08329801722
5
HYPOTHESIS TESTING:
Null Hypothesis: The mean no. of TVs is not greater than 28.5
Alternate Hypothesis: The mean no. of TVs is greater than 28.5
H0 = µ ≤ 28.5
H1 = µ > 28.5
Step 1:

Step 2:

1
YASH AGGARWAL 08329801722
6
Step 3:

Output:
t-Test: Two-Sample Assuming Equal Variances

Hours Dummy
Mean 30.48275862 0
Variance 19.13362069 0
Observations 29 3
Pooled Variance 17.85804598
Hypothesized Mean Difference 28.5
df 30
t Stat 0.773637505
P(T<=t) one-tail 0.222599519
t Critical one-tail 1.697260887
P(T<=t) two-tail 0.445199038
t Critical two-tail 2.042272456

DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

1
YASH AGGARWAL 08329801722
7
INFERENCE:
Since t stat (0.77) is lesser than t Critical (1.69), accept Null hypothesis.
Since P (0.22) is greater than α (0.05), accept Null hypothesis.

CONCLUSION:
The mean number of TV hours watched per week is not greater than 28.5

1
YASH AGGARWAL 08329801722
8
(ii) T-test
One sample t-test using dummy (two tailed)
Problem: There is a significant difference between the mean age of population and estimated
mean age of population. Mean age of population is 40.

Age Dummy
42 0
76 0
56 0
67
65
65
89
45
45
65
78
55
44
65
76
89
54
56
56
76
45

1
YASH AGGARWAL 08329801722
9
HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference between mean age and estimated age
Alternate Hypothesis: There is a significant difference between mean age and estimated age
H0 = µ = 40
H1 = µ ≠ 40

Step1:

Step2:

2
YASH AGGARWAL 08329801722
0
Step 3:

Output:
t-Test: Two-Sample Assuming Equal Variances

Age Dummy
Mean 62.33333333 0
Variance 208.6333333 0
Observations 21 3
Pooled Variance 189.6666667
Hypothesized Mean Difference 40
df 22
t Stat 2.627378828
P(T<=t) one-tail 0.007690983
t Critical one-tail 1.717144374
P(T<=t) two-tail 0.015381965
t Critical two-tail 2.073873068

2
YASH AGGARWAL 08329801722
1
DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

INFERENCE:
Since t stat (2.62) is greater than t Critical (2.07), reject Null hypothesis.
Since P (0.01) is lesser than α (0.05), reject Null hypothesis.

CONCLUSION:
The population mean is greater than 40 at α=0.05

2
YASH AGGARWAL 08329801722
2
(iii) T-test
Two sample t-test (one tail)
Problem: To analyse that the time spent by full time students in studying statistics is greater
than the time spent by part time students.

Full Part
time time
3.2 3.1
1.5 3.4
6.5 4.6
0.2 2.8
3.7 2.3
3.3 1.5
1.7 3.8
3.6 9.5
3.8 4.3
5.3 2.7
6.9 1.6
3.6 1.6
1.7 3.2
1.2 4.2
7.2 3.9
3.9 1.2
1.9 0
5.3 0

HYPOTHESIS TESTING:
Null Hypothesis: The time spent by full time students studying statistics is not more than the
time spent by part time students
Alternate Hypothesis: The time spent by full time students studying statistics is more than
the time spent by part time students
H0 = µf ≤ µp; µf - µp≤0
H1 = µf > µp; µf - µp>0
Step 1:

2
YASH AGGARWAL 08329801722
3
Step 2:

2
YASH AGGARWAL 08329801722
4
Step 3:

Output:
t-Test: Two-Sample Assuming Equal Variances

Full time Part time


Mean 3.583333 2.983333333
Variance 4.133235 4.566176471
Observations 18 18
Pooled Variance 4.349706
Hypothesized Mean Difference 0
df 34
t Stat 0.863063
P(T<=t) one-tail 0.197075
t Critical one-tail 1.690924
P(T<=t) two-tail 0.39415
t Critical two-tail 2.032245

DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

INFERENCE:
Since t stat (0.86) is lesser than t Critical (1.69), accept Null hypothesis.
2
YASH AGGARWAL 08329801722
5
Since P (0.19) is greater than α (0.05), accept Null hypothesis.

CONCLUSION:
The time spent by full time students studying statistics is not more than the time spent by part
time students

2
YASH AGGARWAL 08329801722
6
(iv) T-test
Two sample t-test (two tail)
Problem: Two types of drugs were used on 7 patients for reducing their weight. Drug A was
imported and drug B was indigenous. The decrease in the weight after using drugs for six
months was as follows:

Is there a significant difference in the efficiency of the two drugs?


Drug A Drug B
10 8
12 9
13 12
11 14
14 15
12 10
13 9

HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference in the efficiency of the two drugs
Alternate Hypothesis: There is a significant difference in the efficiency of two drugs
H0 = µa = µb; µa - µb = 0
H1 = µa ≠ µb; µa - µb ≠ 0

2
YASH AGGARWAL 08329801722
7
Step 1:

2
YASH AGGARWAL 08329801722
8
Step 2:

2
YASH AGGARWAL 08329801722
9
Step 3:

Output:

t-Test: Two-Sample Assuming Equal Variances

Drug A Drug B
12.1428571
Mean 4 11
7.33333333
Variance 1.80952381 3
Observations 7 7
4.57142857
Pooled Variance 1
Hypothesized Mean
Difference 0
df 12
t Stat 1
0.16852452
P(T<=t) one-tail 9
1.78228755
t Critical one-tail 6
0.33704905
P(T<=t) two-tail 8
t Critical two-tail 2.17881283

3
YASH AGGARWAL 08329801722
0
DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

INFERENCE:
Since t stat (1) is lesser than t Critical (2.17), accept Null hypothesis.
Since P (0.33) is greater than α (0.05), accept Null hypothesis.

CONCLUSION:
There is no significant difference in the efficiency of the two drugs.

3
YASH AGGARWAL 08329801722
1
T-test of 2 samples
Problem: To determine which means of the subjects is different from other. We'll apply T-test
2 sample assuming equal variances for:

1. Economics and Science


2. Science & History
3. Economics and History

1. Economics and Science –


Null hypothesis: There is no difference between the means of the subjects
Alternate hypothesis: There is a difference between the means of the subjects
H0: u1=u2
H1: u1≠u2, u1 – u2=0

2. Science and History –


Null hypothesis: There is no difference between the means of the subjects
Alternate hypothesis: There is a difference between the means of the subjects
H0: u2=u3
H1: u2≠u3, u2 – u3=0

3. Economics and History –


Null hypothesis: There is no difference between the means of the subjects
Alternate hypothesis: There is a difference between the means of the subjects
H0: u1=u3
H1: u1≠u3, u1 – u3=0

3
YASH AGGARWAL 08329801722
2
Output:

(i) Economics & Science


DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis.
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since t stat (-4.43) is less than t Critical (2.144), accept null hypothesis.
Since p(t) (0.0005) is less than α (0.05), accept null hypothesis.

CONCLUSION:
There is a significant difference between means marks of the students in subjects –
economics and science
(ii) Science & History
DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis.
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since t stat (4.95) is greater than t Critical (1.76), reject null hypothesis.
Since p(t)(0.0002) is less than α (0.05), reject null hypothesis.

3
YASH AGGARWAL 08329801722
3
CONCLUSION:
There is a no difference between means marks of the students in subjects - science and
history

(iii) Economics & History


DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis.
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since t stat (1.62) is lesser than t Critical (2.11), accept null hypothesis.
Since p(t) (0.12) is greater than α (0.05), accept null hypothesis.

CONCLUSION:
There is a difference between the mean marks of students in subjects’ economics and science

3
YASH AGGARWAL 08329801722
4
(v) Paired sample t-test (one tail)
Problem: Is there sufficient evidence to suggest that the mean time to exhaustion is greater
after chocolate milk than after carbohydrate replacement drink? Use a significance level of
0.1. (Use µcm-µcd in hypothesis statements)

Cyclis Chocolate
t Milk Carbohydrate Replacement Drink
1 50.46 42.9
2 47.08 50.1
3 57.51 41.67

4 46.6 32.69
5 29.1 46.33
6 57.5 31.63
7 23.87 20.61
8 28.65 14.99
9 35.37 20.11

HYPOTHESIS TESTING:
Null Hypothesis: Mean time to exhaustion is not greater after chocolate milk than after
carbohydrate replacement drink
Alternate Hypothesis: Mean time to exhaustion is greater after chocolate milk than after
carbohydrate replacement drink
H0 = µcm ≤ µcd or µcm - µcd ≤ 0
H1 = µcm ≥ µcd or µcm - µcd ≥ 0

3
YASH AGGARWAL 08329801722
5
Step 1:

Step 2:

3
YASH AGGARWAL 08329801722
6
Step 3:

Output:
t-Test: Paired Two Sample for Means

Chocolate Milk Carbohydrate Replacement Drink


Mean 41.79333333 33.44777778
Variance 164.53125 160.9338194
Observations 9 9
Pearson Correlation 0.508406248
Hypothesized Mean Difference 0
df 8
t Stat 1.979280834
P(T<=t) one-tail 0.0415706
t Critical one-tail 1.39681531
P(T<=t) two-tail 0.083141199
t Critical two-tail 1.859548038

3
YASH AGGARWAL 08329801722
7
DECISION RULE:
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

INFERENCE:
Since t stat (1.97) is lesser than t Critical (1.39), reject Null hypothesis.
Since P (0.04) is lesser than α (0.1), reject Null hypothesis.

CONCLUSION:
Mean time to exhaustion is greater after chocolate milk than after carbohydrate replacement
drink.

3
YASH AGGARWAL 08329801722
8
(vi) Paired sample t-test (two tail)
Problem: Determine that there is a significant difference between the time to finish the race
when race is completed with local shoes and branded shoes.

Athelet Local Branded


e shoes shoes

1 3.2 3.1
2 1.5 3.4
3 6.5 4.6
4 0.2 2.8
5 3.7 2.3
6 3.3 1.5
7 1.7 3.8
8 3.6 9.5
9 3.8 4.3
10 5.3 2.7
11 6.9 1.6
12 3.6 1.6
13 1.7 3.2
14 1.2 4.2
15 7.2 3.9

HYPOTHESIS TESTING:
Null Hypothesis: There is a no significant difference between the time to finish the race
when race is completed with local shoes and branded shoes.
Alternate Hypothesis: There is a significant difference between the time to finish the race
when race is completed with local shoes and branded shoes.
H0 = µL=µB, µL- µB=0
H1 = µL≠µB, µL- µB≠0

3
YASH AGGARWAL 08329801722
9
Step 1:

Step 2:

4
YASH AGGARWAL 08329801722
0
Step 3:

Output:

t-Test: Paired Two Sample for Means

Local shoes Branded shoes


Mean 3.56 3.5
Variance 4.598285714 3.76
Observations 15 15
Pearson Correlation -0.022160001
Hypothesized Mean Difference 0
df 14
t Stat 0.079506488
P(T<=t) one-tail 0.468877535
t Critical one-tail 1.761310136
P(T<=t) two-tail 0.93775507
t Critical two-tail 2.144786688

DECISION RULE:

4
YASH AGGARWAL 08329801722
1
If t-stat is greater than t-critical, reject null hypothesis. If p(t) is less than α, reject Null
hypothesis

INFERENCE:
Since t stat (0.07) is lesser than t Critical (2.14), accept Null hypothesis.
Since P (0.93) is greater than α (0.05), accept Null hypothesis.

CONCLUSION:
There is a no significant difference between the time to finish the race when race is
completed with local shoes and branded shoes

4
YASH AGGARWAL 08329801722
2
Z-Test
(i) Z-test
Problem: The net annual returns (the returns on investment after deducting all relevant fees)
in percentage are given. Can investors do better by buying mutual funds directly from banks
or other financial institutions than by purchasing mutual funds through brokers? Can we
conclude at the 5% significance level that directly-purchased mutual funds outperform
mutual funds bought through brokers?
Direct Broker
9.33 3.24
6.94 -6.76
16.17 12.8
16.97 11.1
5.94 2.73
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07
10.37 9.24
-0.63 -2.67
-0.15 8.97

4
YASH AGGARWAL 08329801722
3
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
4.39 8.31
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57
13.12 -8.44
-6.54 -5.72
-1.06 6.95

HYPOTHESIS TESTING:
Null Hypothesis: Investors do not do better by buying mutual funds directly from banks or
other financial institutions than by purchasing mutual funds through brokers
Alternate Hypothesis: Investors do better by buying mutual funds directly from banks or
other financial institutions than by purchasing mutual funds through brokers
H0 = µFI≤µB; µF-µB≤0
H1 = µFI>µB; µF-µB>0

4
YASH AGGARWAL 08329801722
4
Step 1:

Step 2:

4
YASH AGGARWAL 08329801722
5
Step 3:

Output:
z-Test: Two Sample for Means

Direct Broker
Mean 6.6312 3.7232
Known Variance 36.7384 42.4725
Observations 50 50
Hypothesized Mean
Difference 0
2.31039869
z 4
0.01043304
P(Z<=z) one-tail 6
1.64485362
z Critical one-tail 7
0.02086609
P(Z<=z) two-tail 1
1.95996398
z Critical two-tail 5

4
YASH AGGARWAL 08329801722
6
DECISION RULE:
If z-stat is greater than z-critical, reject null hypothesis. If p(z) is less than α, reject Null
hypothesis

INFERENCE:
Since z stat (2.31) is greater than z Critical (1.64), reject null hypothesis.
Since P (0.01) is less than α (0.05), reject null hypothesis.

CONCLUSION:
Investors do better by buying mutual funds directly from banks or other financial institutions
than by purchasing mutual funds through brokers

4
YASH AGGARWAL 08329801722
7
(ii) F-test
Problem: Determine whether variance of class1 is greater than the variance of Class2

Class Class
1 2
65 76
76 54
65 67
76 65
56 76
45 66

HYPOTHESIS TESTING:
Null Hypothesis: Variance of class1 is not greater than variance of class 2
Alternate Hypothesis: Variance of class1 is greater than variance of class 2
H0 = Var1≤Var2
H1 = Var1>Var2

4
YASH AGGARWAL 08329801722
8
Step 1:

Step 2:

4
YASH AGGARWAL 08329801722
9
Step 3:

Output:
F-Test Two-Sample for Variances

Class1 Class2
Mean 63.83333333 67.33333333
Variance 142.9666667 67.06666667
Observations 6 6
df 5 5
F 2.131709742
P(F<=f) one-tail 0.212888468
F Critical one-tail 5.050329058

5
YASH AGGARWAL 08329801722
0
DECISION RULE:
If f-stat is greater than f-critical, reject null hypothesis.
If p(f) is less than α, reject Null hypothesis

INFERENCE:
Since f stat (2.13) is lesser than f Critical (5.05), accept null hypothesis.
Since P (0.21) is more than α (0.05), accept null hypothesis.

CONCLUSION:
Variance of class1 is not greater than variance of class 2

5
YASH AGGARWAL 08329801722
1
(i) ANOVA Test
ANOVA-Single Factor
Problem: To test that there is a significant difference between means marks of the students in
subjects - economics, science and history

Economics Science History


42 69 35
53 54 40
49 58 53
53 64 42
43 64 50
44 55 39
45 56 55
52 39
54 40

HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference between means marks of the students in
subjects - economics, science and history
Alternate Hypothesis: There is a significant difference between means marks of the students
in subjects - economics, science and history
H0 = µe=us=uh
H1 = at least one of the means is different, µe≠us≠uh

5
YASH AGGARWAL 08329801722
2
Step 1:

Step 2:

5
YASH AGGARWAL 08329801722
3
Step 3:

Output:
Anova: Single Factor

SUMMARY
Varianc
Groups Count Sum Average e
48.3333
Economics 9 435 3 23.5
32.3333
Science 7 420 60 3
43.6666
History 9 393 7 50.5

ANOVA
Source of Variation SS df MS F P-value F crit
1085.8 15.1962 7.16E- 3.44335
Between Groups 4 2 542.92 3 05 7
35.7272
Within Groups 786 22 7

1871.8
Total 4 24

5
YASH AGGARWAL 08329801722
4
DECISION RULE:
If f-stat is greater than f-critical, reject null hypothesis.
If p(f) is less than α, reject Null hypothesis

INFERENCE:
Since f stat (15.19623) is greater than f Critical (3.443357), reject null hypothesis.
Since p (f)(7.16E-05) is less than α (0.05), reject null hypothesis.

CONCLUSION:
There is a significant difference between means marks of the students in subjects -
economics, science and history

5
YASH AGGARWAL 08329801722
5
(ii) ANOVA TEST
ANOVA- Two Factor without replication
Problem: To test whether or not marks of students differ with respect to student and subject
both.

student economics science history


a 42 69 35
b 53 54 40
c 49 58 53
d 53 64 42
e 43 64 50

HYPOTHESIS TESTING:
Row wise:
Null Hypothesis: There is no significant difference in marks of students.
Alternate Hypothesis: There is significant difference in marks of students.

Column Wise:
Null Hypothesis: There is no significant difference in marks for three subjects- Economics,
Science and History.
Alternate Hypothesis: There is significant difference in marks for three subjects-
Economics, Science and History.

5
YASH AGGARWAL 08329801722
6
Step 1:

Step 2:

5
YASH AGGARWAL 08329801722
7
Step 3:

Output:
Anova: Two-Factor Without Replication

Su Averag Varianc
SUMMARY Count m e e
14 48.666 322.33
a 3 6 67 33
14
b 3 7 49 61
16 53.333 20.333
c 3 0 33 33
15
d 3 9 53 121
15 52.333 114.33
e 3 7 33 33

24
economics 5 0 48 28
30
science 5 9 61.8 34.2
22
history 5 0 44 54.5

ANOVA

5
YASH AGGARWAL 08329801722
8
Source of
Variation SS df MS F P-value F crit
Rows 60.933 4 15.233 0.3002 0.8698 3.8378
33 33 63 89 53
Columns 872.13 2 436.06 8.5952 0.0101 4.4589
33 67 69 72 7
Error 405.86 8 50.733
67 33

Total 1338.9 14
33

(iii) ANOVA TEST


ANOVA- Two Factor with replication
Problem: Anova with replication-two factors-A two-way ANOVA with replication is
performed when you have two groups and individuals within that group are doing more than
one thing (i.e., taking two tests).

economic scienc histor


s e y
SCHOOL A 42 69 35
53 54 40

49 58 53
53 64 42

43 64 50

SCHOOL B 44 55 39
45 56 55

52 0 39

54 0 40
0 0 0

Hypothesis Testing:
Row wise:
H0: There is no significant difference between school A and School B

5
YASH AGGARWAL 08329801722
9
H1: There is a significant difference between school A and School B

Column wise:
H01: There is no significant difference between economics, medicine and history

H2: There is a significant difference between economics, medicine and history

Interaction wise:
H03: There is no significant difference between school A and School B subject-wise (in
conjunction with subjects)
H3: There is a significant difference between school A and School B subject-wise (in
conjunction with subjects)

Step 1:

Step 2:

6
YASH AGGARWAL 08329801722
0
Step 3:

Output:
Anova: Two-Factor With Replication

SUMMARY economics science history Total


SCHOOL A
Count 5 5 5 15
Sum 240 309 220 769
Average 48 61.8 44 51.26667

6
YASH AGGARWAL 08329801722
1
Variance 28 34.2 54.5 95.6381

SCHOOL B
Count 5 5 5 15
Sum 195 111 173 479
Average 39 22.2 34.6 31.93333
Variance 494 924.2 420.3 579.4952

Total
Count 10 10 10
Sum 435 420 393
Average 43.5 42 39.3
Variance 254.5 861.5556 235.5667

ANOVA
Source of
Variation SS df MS F P-value F crit
Sample 2803.333 1 2803.333 8.6027 0.007272 4.259677
Columns 90.6 2 45.3 0.139014 0.870912 3.402826
Interaction 1540.467 2 770.2333 2.363646 0.115611 3.402826
Within 7820.8 24 325.8667

Total 12255.2 29

Decision Rule:
If f-stat is greater than f critical, reject Null Hypothesis.
If p(f) is less than F, reject Null Hypothesis
Inference:
Row Wise
Since f stat (8.37636059) is greater than f-critical (4.49399848), reject null hypothesis.
Since p value (0.01) is less than  (0.05), we will reject null hypothesis.
Column Wise
Since f stat (0.101) is less than f-critical (4.49399848), accept null hypothesis.
Since p value (0.753) is greater than  (0.05), we will accept null hypothesis.
Interaction Wise
Since f stat (3.181) is greater than f-critical (4.49399848), accept null hypothesis.
Since p value (0.093) is greater than  (0.05), we will accept null hypothesis.

Conclusion:

6
YASH AGGARWAL 08329801722
2
Row Wise
There is enough evidence that marks of students differ significantly school wise.

Column Wise
There is enough evidence that there is no difference between the marks of the three subjects,
i.e., Economics, Science and History.

Interaction Wise
There is no significant difference between the marks of the School A and School B subject
wise (in conjunction with subjects).

6
YASH AGGARWAL 08329801722
3
CHI SQUARE TEST
Problem Statement- To analyse that there is a significant relationship between gender and
newspaper brand.

Null Hypothesis : There is no significant relationship between gender and newspaper brand
Alternate Hypothesis : There is a significant relationship between gender and newspaper
brand

Observed
Count of Column Labels
Newspaper
Row Labels Economic Hindustan The Indian Times of Grand
Times Times Express India Total
female 13 16 8 6 43

6
YASH AGGARWAL 08329801722
4
male 16 15 11 7 49
Grand Total 29 31 19 13 92

Expected values

Expected values = row total *column total/Grand total

Expected
Row Economic Hindustan The Indian Times of
Labels Times Times Express India
female 13.55435 14.48913 8.880435 6.076087
male 15.44565 16.51087 10.11957 6.923913

Chi square test

Chi Square test


Row Economic Hindustan The Indian Times of
labels Times Times Express India
female 0.022672 0.157548 0.087289 0.000953
male 0.019896 0.138256 0.076601 0.000836

X2 = 0.50404971
Degree of freedom
(row-1) * (column-1)
(2-1)*(4-1) = 3

6
YASH AGGARWAL 08329801722
5
pvalue = 0.918

decision rule:

if p value is less than alpha, reject null hypothesis

inference:

here p value (0.918) is greater than alpha (0.05), accept null hypothesis

conclusion:

there is no significant relationship between gender and newspaper brand.

6
YASH AGGARWAL 08329801722
6
Regression Analysis
Problem: To check whether there is a significant relationship between umbrellas sold and
rainfall. Determine the regression equation for the same.

Umbrellas sold (Y) Rainfall (X)


5 80
23 78
25 60
48 53
17 85
8 84
4 73
26 79
11 81
19 75
14 68
35 72
29 58
4 92
23 65

HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference between the umbrellas sold and rainfall
Alternate Hypothesis: There is significant difference between umbrellas sold and rainfall
Y= Dependent Variable
X= Independent Variable
b= Slope

6
YASH AGGARWAL 08329801722
7
Step 1:

Step 2:

6
YASH AGGARWAL 08329801722
8
Step 3:

Output:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.713430174
R Square 0.508982614
Adjusted R Square 0.471212046
Standard Error 9.056631043
Observations 15

ANOVA
df SS MS F Significance F
Regression 1 1105.306644 1105.306644 13.47564 0.002822343
Residual 13 1066.293356 82.02256585
Total 14 2171.6

Coefficien Standard Lower Upper


t Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept 78.978421 16.3974364 4.81651028 0.00033673 43.5539133 114.402929 43.5539133 114.402929
Rainfall -
(X) 0.8102233 0.22071407 -3.6709183 0.00282234 -1.2870471 -0.3333996 -1.2870471 -0.3333996

Equation:

6
YASH AGGARWAL 08329801722
9
Y=bx+a
Y=0.8102x + 78.978

Rainfall (X) Line Fit Plot


60
50
Umbrellas sold (Y)

40 Umbrellas sold (Y)


30 f(x) = − 0.810223313272095 x + 78.9784209692747 Predicted Umbrellas sold (Y)
Linear (Predicted Umbrellas sold
20 (Y))
10
0
50 55 60 65 70 75 80 85 90 95
Rainfall (X)

Decision Rule:
If p is less than alpha, reject null hypothesis.

Inference:
Here p (0.0003) is less than alpha, we will reject null hypothesis

Conclusion
There is a significant relationship between umbrellas sold and rainfall

7
YASH AGGARWAL 08329801722
0
R Studio
HYPOTHESIS TESTING in R Studio

 How to Install R Studio?

In order to install R Studio, we first need to install R. Following are the steps
how to install R:

1. Go to CRAN, click Download R for Windows, click Base, and download the installer for the
latest R version.
2. Right-click the installer file and select Run as Administrator from the pop-up menu.
3. Select the language to be used during installation.
This doesn’t change the language used by R; all messages and Help files remain in English.
4. Follow the instructions of the installer.
You can safely use the default settings and just keep clicking Next until R starts installing.

After installing the setup of R,we can install the setup of R Studio. Following
are the steps how to install R Studio:

1. Install R. Leave all default settings in the installation options.


2. Open RStudio.
3. Go to the “Packages” tab and click on “InstallPackages”. ...
4. Start typing “Rcmdr” until you see it appear in a list. ...
5. Wait while all the parts of the R Commander package are installed.

7
YASH AGGARWAL 08329801722
1
R and RStudio

R is a programming language used for statistical computing while RStudio uses the R
language to develop statistical programs. In R, you can write a program and run the code
independently of any other computer program. RStudio however, must be used alongside R in
order to properly function. Often referred to as an IDE, or integrated development
environment, RStudio allows users to develop and edit programs in R by supporting a large
number of statistical packages, higher quality graphics, and the ability to manage your
workspace.

R and RStudio are not separate versions of the same program, and cannot be substituted for
one another. R may be used without RStudio, but RStudio may not be used without R.

The Advantages of RStudio

1) RStudio is designed to make it easy to write scripts.

As soon as you create a new script, the windows within your RStudio session adjust
automatically so you can see both your script and the results in your console when you run
your syntax.

Even better is the ability to call up potential syntax options while you are writing just by
using the tab key.

For example, suppose I am trying to access a variable in a data set called “teachers”, but I
haven’t memorized the variable names:

2) RStudio makes it convenient to view and interact with the objects stored in your
environment.

7
YASH AGGARWAL 08329801722
2
In the basic R GUI, you can always list the objects you have stored in your environment. But
RStudio has a very useful “Environment” window available.

This shows all of the objects that you have stored, including data; scalars, vectors, and
matrices; model outputs; etc., along with a summary of the information that is stored in those
objects.

You can even click on your data sets directly to open them and view them as spreadsheets.

3) RStudio makes it easy to set your working directory and access files on your computer.

Especially if you are working in Windows, one of the most tedious parts of programming in
R is setting your working directory to access your files.

With RStudio, you can navigate to folders on your computer in the “Files” window, view any
files you have in that folder, and set that folder as the working directory.

7
YASH AGGARWAL 08329801722
3
Command :

setwd("c:/Documents/my/working/directory")

Set a default working directory

A default working directory is a folder where RStudio goes, every time you open it. You can
change the default working directory from RStudio menu under: Tools –> Global options –>
click on “Browse” to select the default working directory you want.

4) RStudio makes graphics much more accessible for a casual user.

The basic R GUI requires you to go to some lengths to save graphics as you go. But RStudio
has a window that does exactly that.

You can easily click back and forth between plots, change the sizes of your plot without
rerunning the code, and export or copy plots to include in other documents.

7
YASH AGGARWAL 08329801722
4
Four Panes in RStudio

RStudio is a four pane work-space for 1) creating file containing R script, 2) typing R
commands, 3) viewing command histories, 4) viewing plots and more.

Top-left panel: Code editor allowing you to create and open a file containing R script. The R
script is where you keep a record of your work. R script can be created as follow: File –>
New –> R Script.

7
YASH AGGARWAL 08329801722
5
Bottom-left panel: R console for typing R commands

Top-right panel:

Workspace tab: shows the list of R objects you created during your R session

History tab: shows the history of all previous commands

Bottom-right panel:

Files tab: show files in your working directory

Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF
or an image files

Packages tab: show external R packages available on your system. If checked, the package is
loaded in R.

7
YASH AGGARWAL 08329801722
6
IMPORT OF DATA SHEET IN EXCEL

Step 1:

Step 2:

7
YASH AGGARWAL 08329801722
7
Step 3 [Output]:

7
YASH AGGARWAL 08329801722
8
Descriptive statistics using r-studio
Step 1:

Step 2:

7
YASH AGGARWAL 08329801722
9
Step 3:

Output:

8
YASH AGGARWAL 08329801722
0
Coding:
For Summary Statistics:
summary(one_sample_t_test_Rstudio$Hours)
For Standard Deviation:
sd(one_sample_t_test_Rstudio$Hours)
For Variance:
var(one_sample_t_test_Rstudio$Hours)

Result:
For Summary Statistics:
summary(one_sample_t_test_Rstudio$Hours)

Min. 1st Qu. Median Mean 3rd Qu. Max.


23.00 27.00 30.00 30.48 33.70 39.00

For Standard Deviation:


sd(one_sample_t_test_Rstudio$Hours)
[1] 4.374199

For Variance:
var(one_sample_t_test_Rstudio$Hours)
[1] 19.13362

8
YASH AGGARWAL 08329801722
1
Correlation using R-Studio

Step 1:

Step 2:

8
YASH AGGARWAL 08329801722
2
Step 3:

Output:

Coding:
cor.test(corelation$`Advertisement in month`,corelation$`Sales in crores`)

Result:
Pearson's product-moment correlation

data: corelation$`Advertisement in month` and corelation$`Sales in crores`

8
YASH AGGARWAL 08329801722
3
t = 1.359, df = 6, p-value = 0.223
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3335576 0.8866886
sample estimates:
cor
0.4851491

Inference:
Here r = +0.48, therefore there is a positive correlation between
advertisements and sales

8
YASH AGGARWAL 08329801722
4
Hypothesis Testing using R-Studio

(i) One sample t test


Problem: Suppose that we want to hypothesize that the mean
number of TV hours watched per week is greater than 28.5

HYPOTHESIS TESTING:
Null Hypothesis: Mean number of TV hours watched per week is less than 28.5
Alternate Hypothesis: Mean number of TV hours watched per week is greater than 28.5

Step 1:

8
YASH AGGARWAL 08329801722
5
Step 2:

Step 3:

8
YASH AGGARWAL 08329801722
6
Coding:
t.test(one_sample_t_test_Rstudio$Hours,alternative = "greater",mu=28.5)

Result:
data: one_sample_t_test_Rstudio$Hours
t = 2.441, df = 28, p-value = 0.01061
alternative hypothesis: true mean is greater than 28.5
95 percent confidence interval:
29.10098 Inf
sample estimates:
mean of x
30.48276

DECISION RULE:
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since p(t)(0.01061) is less than α (0.05), reject null hypothesis.

CONCLUSION:
Mean number of TV hours watched per week is greater than 28.5

8
YASH AGGARWAL 08329801722
7
(ii) Two Sample T-test

Problem: To analyse that there is a significant difference between


the marks scored by class groups A & B in mathematics at α=10%

HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference between the marks scored by class
groups A & B in mathematics
Alternate Hypothesis: There is significant difference between the marks scored by class
groups A & B in mathematics

Step 1:

Step 2:

8
YASH AGGARWAL 08329801722
8
Step 3:

Coding:
t.test(twosample_t_test2$`Group A`,twosample_t_test2$`Group B`,conf.level = 0.90)

Result:

8
YASH AGGARWAL 08329801722
9
data: twosample_t_test2$`Group A` and twosample_t_test2$`Group B`
t = 1.7863, df = 26.177, p-value = 0.08565
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
0.3200806 13.7851826
sample estimates:
mean of x mean of y
82.47368 75.42105

DECISION RULE:
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since p(t)(0.08565) is less than α (0.10), reject null hypothesis.

CONCLUSION:
There is significant difference between the marks scored by class
groups A & B in mathematics

9
YASH AGGARWAL 08329801722
0
(iii) Paired Sample T-Test

Problem: Determine that there is a significant difference between


the time to finish the race when race is completed with local shoes
and branded shoes.

HYPOTHESIS TESTING:
Null Hypothesis: There is no significant difference between
the time to finish the race when race is completed with local shoes
and branded shoes.

Alternate Hypothesis: There is a significant difference between


the time to finish the race when race is completed with local shoes
and branded shoes.

Step 1:

9
YASH AGGARWAL 08329801722
1
Step 2:

Step 3:

9
YASH AGGARWAL 08329801722
2
Coding:
t.test(PT_TEST_R_STUDIO_1_$`Local shoes`,PT_TEST_R_STUDIO_1_$`Branded
shoes`,paired = T)
Result:
data: PT_TEST_R_STUDIO_1_$`Local shoes` and PT_TEST_R_STUDIO_1_$`Branded
shoes`
t = 0.079506, df = 14, p-value = 0.9378
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-1.558575 1.678575
sample estimates:
mean difference
0.06

DECISION RULE:
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since p(t)(0.9378) is greater than α (0.05), accept null hypothesis.

CONCLUSION:
There is no significant difference between the time to finish the race when race is completed
with local shoes and branded shoes.

9
YASH AGGARWAL 08329801722
3
F Test Using R-Studio

Problem: Determine whether Variance of Class1 is greater than


variance of class2 in mathematics.

HYPOTHESIS TESTING:
Null Hypothesis: Variance of Class1 is not greater than variance of
class2 in mathematics.

Alternate Hypothesis: Variance of Class1 is greater than variance of


class2 in mathematics.

Step 1:

Step 2:

9
YASH AGGARWAL 08329801722
4
Step 3:

Coding:
var.test(f_test$Class1,f_test$Class2)
Result:
data: f_test$Class1 and f_test$Class2
F = 2.1317, num df = 5, denom df = 5, p-value = 0.4258
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2982922 15.2340118
sample estimates:
ratio of variances
2.13171

DECISION RULE:
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since p(t)(0.4258) is greater than α (0.05), accept null hypothesis.

CONCLUSION:
Variance of Class1 is not greater than variance of class2 in
mathematics.

9
YASH AGGARWAL 08329801722
5
9
YASH AGGARWAL 08329801722
6
ANOVA using R-Studio

Problem: To test that the means marks of the students in subjects - economics, science and
history are all equal.

Step 1:

Step 2:

9
YASH AGGARWAL 08329801722
7
Step 3:

Coding:
combinedgroup=data.frame(cbind(ANOVA$Economics,ANOVA$Science,ANOVA$History)
)
summary(combinedgroup)
stack(combinedgroup)
stackedgroup=stack(combinedgroup)
anovaresult=aov(values~ind, data=stackedgroup)
summary(anovaresult)

Result:
>
combinedgroup=data.frame(cbind(ANOVA$Economics,ANOVA$Science,ANOVA$History
))
> summary(combinedgroup)
X1 X2 X3
Min. :42.00 Min. :54.0 Min. :35.00
1st Qu.:44.00 1st Qu.:55.5 1st Qu.:39.00
Median :49.00 Median :58.0 Median :40.00
Mean :48.33 Mean :60.0 Mean :43.67
3rd Qu.:53.00 3rd Qu.:64.0 3rd Qu.:50.00
Max. :54.00 Max. :69.0 Max. :55.00
NA's :2
9
YASH AGGARWAL 08329801722
8
> stack(combinedgroup)
values ind
1 42 X1
2 53 X1
3 49 X1
4 53 X1
5 43 X1
6 44 X1
7 45 X1
8 52 X1
9 54 X1
10 69 X2
11 54 X2
12 58 X2
13 64 X2
14 64 X2
15 55 X2
16 56 X2
17 NA X2
18 NA X2
19 35 X3
20 40 X3
21 53 X3
22 42 X3
23 50 X3
24 39 X3
25 55 X3
26 39 X3
27 40 X3
> stackedgroup=stack(combinedgroup)
> anovaresult=aov(values~ind, data=stackedgroup)
> summary(anovaresult)
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 1086 542.9 15.2 7.16e-05 ***
Residuals 22 786 35.7
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2 observations deleted due to missingness

DECISION RULE:
If p(t) is less than α, reject Null hypothesis

INFERENCE:
Since p(t)(0.4258) is greater than α (0.05), accept null hypothesis.

9
YASH AGGARWAL 08329801722
9
CONCLUSION:
Variance of Class1 is not greater than variance of class2 in
mathematics.

1
YASH AGGARWAL 08329801722 0
0

You might also like