Professional Documents
Culture Documents
SOORAJ S
Registration number: 11809348
The Mean,
Skewness &
It is Symmetric Median, mode Empirical rule
kurtosis
are equal
The Mean , Median , Mode are equal- The highest frequency, or point where there are
the most observations of the variable, is found in the middle of a normal distribution. These
three metrics all fall at the same location, which is the middle. The measurements in a
completely (normal) distributed are typically equal.
Empirical rule - When data is normally distributed, the area under the curve between the mean and a
given number of standard deviations from the mean is constant. 68.25 percent of all instances, for
instance, are within +/- one standard deviation of the mean. Ninety-five percent of instances deviate by
no more than two standard deviations from the mean, and 99 percent of cases deviate by no more than
three standard deviations from the mean
Skewness and Kurtosis - The coefficients skewness and kurtosis indicate how divergent a distribution is
from a normal distribution. Kurtosis measures the thickness of the tail ends in relation to the tails of a
normal distribution, whereas skewness evaluates the symmetry of a normal distribution.
Q2. Discuss any two non-parametric test in detail .
What is Non-parametric test?
Non-parametric tests, often referred to as distribution-free tests, are considered as being less
effective since they rely less on data for their calculations and make less assumptions about the data
set.
1.Kruskal-Wallis test
2. Spearman’s rank correlation
3. Wilcoxon signedrank test
4. Wilcoxon ranksum test
Spearman’s rank correlation- The degree and direction of the link between two ranked variables are
measured by Spearman's rank correlation. It simply provides a measure of how monotonically a relationship
between two variables can be expressed, or how effectively a monotonic function can capture that
relationship.
Example-:
A 35 24
B 20 35
C 49 39
D 44 48
E 30 45
Subject Maths Rank English Rank d d square
A 35 3 24 5 2 4
B 20 5 35 4 1 1
C 49 1 39 3 2 4
D 44 2 48 1 1 1
E 30 4 45 2 2 4
= 1 - (6 * 14) / 5(25 - 1)
= 0.3
The Spearman’s Rank Correlation for the given data is 0.3. The value is near 0,
which means that there is a weak correlation between the two ranks.
Kruskal-Wallis Test- The non-parametric alternative to the One Way ANOVA is the Kruskal Wallis test.
Non parametric refers to a test that makes no assumptions about the distribution of your data. In cases
where the ANOVA assumptions aren't fulfilled, the H test is applied (like the assumption of normality).
Since the rankings of the data values rather than the actual data points are utilised in the test, it is
frequently referred to as the one-way ANOVA on ranks.
The test examines if there is a difference between the medians of two or more groups. You compute a test
statistic and contrast it to a distribution cut-off point, as with most statistical tests. The H statistic is the test
statistic applied in this test. The test's hypotheses are as follows:
• Your observations have to be impartial. To put it another way, there shouldn't be any
connections between the individuals who make up each group or between groupings. Refer to
Assumption of Independence for further details on this issue.
• The distributions of shapes for all groups ought to be uniform. The majority of testing tools,
including SPSS and Minitab, will check for this condition.
Q3. Explain Multiple Regression Analysis? Also, discuss the assumptions.
Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables.
The objective of multiple regression analysis is to use the independent variables whose values
are known to predict the value of the single dependent value.
Example-
A researcher decides to study students’ performance from a school over a period of time. He observed that
as the lectures proceed to operate online, the performance of students started to decline as well. The
parameters for the dependent variable “decrease in performance” are various independent variables like
“lack of attention, more internet addiction, neglecting studies” and much more.
Advantages of Multiple regression
Only independent variables with non zero regression coefficients are included in the regression equation.
The changes in the multiple standard errors of estimate and the coefficient of determination are shown.
The stepwise multiple regression is efficient in finding the regression equation with only significant
regression coefficients.
•The variance should be constant for all levels of the predicted variable.
• Between the independent variables, there is a low degree of correlation.
• The variance of the independent variable is constant at all levels.
• In multiple regression, the assumption of normality is required. It means that variables in multiple regression
must have a normal distribution.
• In multiple regression, the model should be specified in a methodical manner. It suggests that the model
should contain just important variables and be accurate.
--------------------------------------------**-----------------------------------------------