You are on page 1of 20

Chi Square

Step 1:
Ho= variables are always independent
h1= variables are dependent

Step 2
Calculate Expected value table
Formula = (Column Total * Row Total)/overall total

Step 3
calculate of Chi square
Oi go downwards
Ei go downwards
Oi-Ei
(Oi-Ei)^2
(Oi-Ei)^2/Ei = sum of this Chi Square calculated

Step 4
Degree of freedom= (No of Rows - 1) * (No of columns - 1)

Step 5
Significance level (0.5 assumed if not mentioned in question)
Chi Square Tabulated from Chi Square Table
Compare
If tabulated is more than calculated Ho is accepted
If tabulated is less than calculated Ho is rejected
A B C D
20 25 24 23
19 23 20 20
21 21 22 20

Ho: Thereis no difference in the mean lifetime of the 4 brands of electric bulb
H1: There is a significant diffrenece in the mean lifetime of the 4 brand of electric bulb

We are running a one way analysis of variance


Fcal = 1.666667
Ftab = 4.066181
Also P value is greater than o.05
hence, Annova Ftab > Fcal
Hence Ho is accepted
Hence there is no difference in the mena lifettime of 4 brands of electric bulb

Q8 A B C D
19 35 24 23 Anova: Single Factor
19 23 17 18
18 17 15 19 SUMMARY
18 21 27 27 Groups
17 19 20 17 A
B
Ho: There is no difference among the lifetime of the bulb of the brands C
H1: There is a difference among the lifetime of the bulb of the brand D

ANOVA
Source of Variation
Between Groups
Within Groups

Total

Interpretation
hence, Annova Ftab > Fcal
Hence there is no difference in the mena lifettime
TIP for hypothesis
Anova: Single Factor Ho is no difference in mean values
H1 is difference in mean values
SUMMARY
Groups Count Sum Average Variance
Column 1 3 60 20 1
Column 2 3 69 23 4
Column 3 3 66 22 4
Column 4 3 63 21 3

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 15 3 5 1.666667 0.250324 4.066181
Within Groups 24 8 3

Total 39 11

a: Single Factor

Count Sum Average Variance


5 91 18.2 0.7
5 115 23 50
5 103 20.6 24.3
5 104 20.8 17.2

SS df MS F P-value F crit
57.7499999999998 3 19.25 0.835141 0.494077 3.238872
368.8 16 23.05

426.55 19

e, Annova Ftab > Fcal


e there is no difference in the mena lifettime of 4 brands of electric bulb
Salesman
Seasons A B C D
Summer 36 36 21 35 Tip: If there are two variables like row represents season a
Winter 28 29 31 32
Monsoon 26 28 29 29

Hypothesis
For Salesmen
Ho: No difference in the performance of salesman
H1: Difference in the performance of salesman
For Seasons
Ho: No difference in sales due to impact of season
H1: Difference in sales due to impact of season
Interpretation
Step 1:
There are two sources of variation
rows represent the season
column represents the salesmen efficiency

There are 3 seasons therefore degree of freedom is 3-1=2


since the SS(Sum of Squares) due to season = 32 MS(Mean of Sum of Square) for the season = 32/2 = 16

Since the SS due to salemen(column) = 42, degree of freedom is 3 which denoted saleman
MS for the salesmen = 42/3= 14

Step 2
Calculattion of F statistics for ANNOVA for the two hypothesis
F= column/error
F= rows/error

Fcal ANNOVA for salemen


since salesmen in column = 14/22.66=0.61765

Fcal ANNOVA for season


Since season in rows= 16/22.66=0.750588

therefore for salemen Ftab > Fcal


thus Ho is accepted, hence there is no difference in the performance of salesmen

Therefore for season Ftab > Fcal


Thus Ho is aceepted, hence there is no difeerence in the sales due to the impact of season

Q10 TREATMENT
DOCTOR 1 2 3 4 5
1 10 14 23 18 20
2 11 15 24 17 21
3 9 12 20 16 19
4 8 13 17 17 20
5 12 15 19 15 22

Tip Replace Pvalue wala part with Fcrit in interpretation


Interpretation

This ANOVA table represents a two-factor (or two-way) ANOVA without


replication, which means there are two independent variables, and
each combination of these variables is only tested once.

The table shows the sum of squares (SS), degrees of freedom (df),
mean squares (MS), F-statistic, and p-value for each source of variation.

The first factor, "Rows," has a sum of squares of 760.67 and degrees of
freedom of 5, indicating that there are 6 levels (or groups) for this
factor. The mean squares for this factor is 152.13, and the F-statistic is
42.65 with a very small p-value of 5.39E-10, indicating that this factor
has a significant effect on the dependent variable.

The second factor, "Columns," has a sum of squares of 379.87 and


degrees of freedom of 4, indicating that there are 5 levels for this
factor. The mean squares for this factor is 94.97, and the F-statistic is
26.63 with a very small p-value of 9.19E-08, indicating that this factor
also has a significant effect on the dependent variable.
The "Error" row shows the sum of squares, degrees of freedom, and
mean squares for the variability within each group, which is the
variability that cannot be attributed to the two factors. The mean
square for error is 3.57.

The "Total" row shows the total sum of squares and degrees of
freedom for all sources of variation.
re are two variables like row represents season and column represents salesmen efficiency, then two factor ANNOVA

Anova: Two-Factor Without Replication

SUMMARY Count Sum Average

Row 1 4 128 32
Row 2 4 120 30
Row 3 4 112 28

Column 1 3 90 30
Column 2 3 93 31
um of Square) for the season = 32/2 = 16 Column 3 3 81 27
Column 4 3 96 32
3 which denoted saleman

ANOVA
Source of Variation SS df MS
Rows 32 2 16
Columns 42 3 14
Error 136 6 22.66667

Total 210 11

ance of salesmen

ue to the impact of season

Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance


DOCTOR 5 15 3 2.5
1 5 85 17 26
2 5 88 17.6 25.8
3 5 76 15.2 21.7
4 5 75 15 21.5
5 5 83 16.6 15.3

6 51 8.5 15.5
6 71 11.83333 24.56667
TREATMENT 6 106 17.66667 58.26667
6 87 14.5 27.5
6 107 17.83333 40.56667

ANOVA
Source of Variation SS df MS F
Rows 760.666666666666 5 152.1333 42.65421
Columns 379.866666666666 4 94.96667 26.62617
Error 71.3333333333337 20 3.566667

Total 1211.86666666667 29
Variance

54
3.333333
2

28
19
28
9

F P-value F crit
0.705882 0.530504 5.143253
0.617647 0.628574 4.757063
P-value F crit
5.391E-10 2.71089
9.188E-08 2.866081
Sr No. Preference (Y) Nutrition Vlue (X) Taste (X2) Preservation Quality (X3)
1 7 5 6 5 Q3:
2 6 4 6 6
3 5 5 7 4
4 6 6 7 5
5 4 3 2 4
6 2 2 1 2
7 3 3 2 3
8 6 5 6 5 Q4:
9 7 7 7 6
10 5 6 5 4
11 4 4 3 2
12 3 6 2 3
13 1 1 2 1
14 2 2 3 1
15 4 5 4 3 Q5:
16 4 4 5 4
17 3 2 1 3
18 6 7 5 4
19 6 5 5 6
20 7 6 4 5

Q1: What is scatter plot and what does it tell us?


Q2: What are positive and negative correlations, draw plots for perfect positive and perfect negative correlations
Q3: Compute the byvariate correlation between preference and nutrition value and interpret
Q4: Compute the bivariate correlation between preference and taste and interpret
Q5: Compute the Multiple correlation and interpret
Q6: Write the multiple regression equation and interpret
Preference
Nutrition
(Y) Vlue (X)
Preference 1 r 0.798
Nutrition V 0.798821 1 D=(r^2) 0.636804

Interpretation
there is 63.68% correlation between nutrition and preference

Preference (Y)Taste (X2)


Preference 1 r 0.801
Taste (X2) 0.801351 1 D=(r^2) 0.641601

Interpretation
there is 64.16% correlation between preference and taste

Preference
Nutrition
(Y) Vlue Taste
(X) Preservation
(X2) Quality (X3)
Preference 1
Nutrition V 0.798821 1
Taste (X2) 0.801351 0.688627 1
Preservatio 0.903608 0.671148 0.731719 1

interpretation same as above

egative correlations Q6: SUMMARY OUTPUT

Regression Statistics
Multiple R 0.94692
R Square 0.896658
Adjusted R 0.877281
Standard E 0.637642
Observatio 20

ANOVA
df SS MS F Significance F
Regression 3 56.44461 18.81487 46.27512 4.138E-08
Residual 16 6.505394 0.406587
Total 19 62.95

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%
Intercept -0.020473 0.422419 -0.048467 0.961944 -0.915963 0.875016
Nutrition V 0.295071 0.121869 2.421206 0.027723 0.036719 0.553423
Taste (X2) 0.161534 0.114679 1.408576 0.178097 -0.081575 0.404643
Preservatio 0.684682 0.147873 4.630216 0.000278 0.371206 0.998158

Interpretation
Y=Mx1+Mx2=Mx3….......+C The multiple regression equation base

Y = -0.020473447 + 0.295071203X1 +

In this equation, Y is the dependent va


coefficients represent the slope of the
variables constant.
The multiple regression equation base
Y is Ddependent variable Y = -0.020473447 + 0.295071203X1 +
X is independent variable
C is constant In this equation, Y is the dependent va
coefficients represent the slope of the
M is slope variables constant.

therefore Interpretation of the coefficients:


Y= 0.2950x1+0.161x2+0.684x3-0.0204 The intercept term (-0.020) represents
the slope of X1, ie, nutrition value, is 0.295 independent variables are equal to zer
the slope of X2, ie, taste, is 0.0.161 The coefficient for X1 (0.295) indicates
dependent variable will increase by 0.
the slope of X3, ie, preservaiton value, is 0.684 The coefficient for X2 (0.162) indicates
the constant, C, is 0.0204 dependent variable will increase by 0.
The coefficient for X3 (0.685) indicates
dependent variable will increase by 0.

It's important to note that the interpre


measurement of the variables. Additio
the independent variables and the dep
assumptions of linear regression.
Lower 95.0%
Upper 95.0%
-0.915963 0.875016
0.036719 0.553423
-0.081575 0.404643
0.371206 0.998158

tiple regression equation based on the given coefficients is:

20473447 + 0.295071203X1 + 0.161534279X2 + 0.68468234*X3

quation, Y is the dependent variable, and X1, X2, and X3 are the independent variables. The
nts represent the slope of the line for each independent variable when holding all other
s constant.
tiple regression equation based on the given coefficients is:

20473447 + 0.295071203X1 + 0.161534279X2 + 0.68468234*X3

quation, Y is the dependent variable, and X1, X2, and X3 are the independent variables. The
nts represent the slope of the line for each independent variable when holding all other
s constant.

tation of the coefficients:

rcept term (-0.020) represents the predicted value of the dependent variable when all the
dent variables are equal to zero.
fficient for X1 (0.295) indicates that for every one unit increase in X1, the predicted value of the
ent variable will increase by 0.295 units when holding all other variables constant.
fficient for X2 (0.162) indicates that for every one unit increase in X2, the predicted value of the
ent variable will increase by 0.162 units when holding all other variables constant.
fficient for X3 (0.685) indicates that for every one unit increase in X3, the predicted value of the
ent variable will increase by 0.685 units when holding all other variables constant.

ortant to note that the interpretation of the coefficients depends on the scaling and units of
ement of the variables. Additionally, the interpretation assumes a linear relationship between
pendent variables and the dependent variable, as well as that the model meets the
tions of linear regression.
Maths Science Question 7
41 47 Step 1: Calculate the Descriptive Statistics for the given data
53 63
54 58 Step 2: Extract the variance from them individually
47 53
57 53 Step 3: Go to Data Analysis and click Z test two mean wala
51 63
42 53 Step 4: put all the necessary input range and variance values click ok
45 39
54 58 Step 5: Compare Zcal and Ztab/crit and then interpret
52 50
51 53 For Example
51 63
71 61 Maths Science
57 55
50 31 Mean 53.24242 Mean 54.18182
43 50 Standard Error 1.522577 Standard Error 1.613423
51 50 Median 52 Median 55
60 58 Mode 57 Mode 53
62 55 Standard Deviation 8.746536 Standard Deviation 9.268409
57 53 Sample Variance 76.50189 Sample Variance 85.90341
35 66 Kurtosis 0.309491 Kurtosis 0.181763
75 72 Skewness 0.423137 Skewness -0.596878
45 55 Range 40 Range 41
57 61 Minimum 35 Minimum 31
45 39 Maximum 75 Maximum 72
46 39 Sum 1757 Sum 1788
66 61 Count 33 Count 33
57 58
49 39
49 55 Hypothesis
57 47 Ho: There is a significant difference in maths and science
64 64 H1: There is no significance difference in maths and science
63 66
ce values click ok

z-Test: Two Sample for Means

Maths Science
Mean 53.24242 54.18182
Known Variance 76.50189 85.90341
Observations 33 33
Hypothesized Mean Difference 0
z -0.423452
P(Z<=z) one-tail 0.335983
z Critical one-tail 1.644854
P(Z<=z) two-tail 0.671965 Calculated
z Critical two-tail 1.959964 Tabulated

Interpretation:
Since out Ztab > Zcal therefore Ho is accepted and H1 is rejected
which means there is a significance difference in maths and science
Maths Science
41 47 Maths Science
53 63
54 58 Mean 53.24242 Mean 54.18182
47 53 Standard Error 1.522577 Standard Error 1.613423
57 53 Median 52 Median 55
51 63 Mode 57 Mode 53
42 53 Standard Deviation 8.746536 Standard Deviation 9.268409
45 39 Sample Variance 76.50189 Sample Variance 85.90341
54 58 Kurtosis 0.309491 Kurtosis 0.181763
52 50 Skewness 0.423137 Skewness -0.596878
51 53 Range 40 Range 41
51 63 Minimum 35 Minimum 31
71 61 Maximum 75 Maximum 72
57 55 Sum 1757 Sum 1788
50 31 Count 33 Count 33
43 50
51 50
60 58
62 55
57 53
35 66
75 72
45 55
57 61
45 39
46 39
66 61
57 58
49 39
49 55
57 47
64 64
63 66

Q9 103.3 Column1
104.15 Interpretation
104.45 Mean 100.457
Based on the descriptive statistics provid
105.1 Standard Error 0.717336 the Steel Authority of India dataset:
102.45 Median 99.375
The mean of the data is 100.457, indicati
98.65 Mode 96.1 The standard deviation of the data is 5.0
97.35 Standard Deviation 5.072334 spread out around the mean.
98.85 Sample Variance 25.72857 The range of the data is 25.9, which is th
values in the dataset.
98.8 Kurtosis 0.789384 The median of the data is 99.375, which
97.75 Skewness -0.113442 halves.
The mode of the data is 96.1, which is th
The skewness of the data is -0.113, indic
The kurtosis of the data is 0.789, sugges
normal distribution.
The sample variance is 25.728, which me
spread out around the mean.
The range of the data is 25.9, which is th
values in the dataset.
The median of the data is 99.375, which
halves.
The mode of the data is 96.1, which is th
98.55 Range 25.9 The skewness of the data is -0.113, indic
102.15 Minimum 85 The kurtosis of the data is 0.789, sugges
108.05 Maximum 110.9 normal distribution.
The sample variance is 25.728, which me
106.5 Sum 5022.85 The standard error is 0.717, which is the
96.1 Count 50 the mean.
The dataset contains 50 observations.
99.1 These descriptive statistics provide a sum
99.65 of the Steel Authority of India dataset. T
the data, as well as detecting any outlier
99.3 interpreted in conjunction with the cont
94.9 limitations of the data.
103.1
96.85
95.6
93.25
94.9
85
96.1
101.9
103.6
103.4
101.75
96.15
98.2
106.4
105.75
110.5
109.1
110.9
109.95
97.2
98.1
99.4
99.5
103.1
104
91.6
96.35
99.35
101.15
97.55
98
Niche jao for question 9

The average marks of maths and science are 53.24 and 54.18 respectively

52 and 55 are the middle values of the given data set


57 and 53 are the most frequently repeated marks in the given data set
8.74 and 9.26 are the standard deviation between the given marks of maths and science resoectively

the descriptive statistics provided, we can make the following interpretations about
Authority of India dataset:

n of the data is 100.457, indicating that the average value is close to 100.
dard deviation of the data is 5.072, suggesting that the data points are somewhat
ut around the mean.
e of the data is 25.9, which is the difference between the minimum and maximum
the dataset.
ian of the data is 99.375, which is the value that separates the dataset into two equal

e of the data is 96.1, which is the most frequently occurring value in the dataset.
wness of the data is -0.113, indicating that the data is roughly symmetric.
osis of the data is 0.789, suggesting that the data is slightly more peaked than a
istribution.
ple variance is 25.728, which measures the spread of the data around the mean.
ut around the mean.
e of the data is 25.9, which is the difference between the minimum and maximum
the dataset.
ian of the data is 99.375, which is the value that separates the dataset into two equal

e of the data is 96.1, which is the most frequently occurring value in the dataset.
wness of the data is -0.113, indicating that the data is roughly symmetric.
osis of the data is 0.789, suggesting that the data is slightly more peaked than a
istribution.
ple variance is 25.728, which measures the spread of the data around the mean.
dard error is 0.717, which is the standard deviation of the sampling distribution of
n.
set contains 50 observations.
scriptive statistics provide a summary of the central tendency, variability, and shape
eel Authority of India dataset. They can be useful in identifying patterns and trends in
as well as detecting any outliers or unusual values. However, they should be
ed in conjunction with the context and purpose of the analysis, as well as any
ns of the data.

You might also like