Advanced Statistics: Sign Test, Kolmogorov Test and Jarque-Bera Test

UNIVERSITY OF MAKATI
J.P Rizal Extension, Brgy. West Rembo, Makati City
Advanced Statistics:
Sign Test, Kolmogorov Test and
Jarque-Bera Test
By:
De Leon, John Vincent
Sabatin, June Evan M.
Vitug, Arianne
III-MATH
SIGN TEST
Many of the hypothesis tests studied so far have imposed one or more requirements for a popular
distribution – such as the population being normal or the variances being equal. A nonparametric test is
a hypothesis test that does not require any specific conditions concerning the shape of the populations or
the value of any population parameters. It is sometimes called distribution free statistics because they do
not require that the data fit a normal distribution.
Another important reason for using these tests is that they allow for the analysis of categorical as
well as rank data. It is widely used for studying populations that take on a ranked order such as movie
review that receives one to four stars. Nonparametric tests are usually easier to perform than parametric
tests. However, they are usually less efficient than parametric tests.
One of the easiest nonparametric tests to perform is the sign test. It is a nonparametric test that can
be used to test a population median against a hypothesized value.
● The Sign Test (One Sample Problem)
● The Sign Test (Matched Pairs)
THE SIGN TEST (ONE SAMPLE PROBLEM)
This is the quickest and simplest nonparametric method. In this case, we consider testing samples from
the same population. In the sign test, we are testing the hypothesis on the median (M) rather than the
mean
The general steps of hypothesis testing can still be followed. The procedural steps to be followed are as
follows:
Step 1: Hypothesis
Null hypothesis:
1
H o : M =M 0 OR H o : p=
2
Alternative hypothesis:
1
H1: M ≠ M o OR H1: p ≠ (two tailed test)
2
1
H1: M > M o OR H 1 : p> (upper one-tailed test)
2
1
H1: M < M o OR H 1 : p< (lower one-tailed test)
2
Specify the level of significance, α .
Note: In this module we will only consider the upper tailed and two tailed hypothesis.
Step 2: Rejection Criteria
Zα
Reject H o if |Z|> (two-sided hypothesis) OR
2
Z cal> Z a (one-sided upper tail hypothesis)
Step 3: Test Statistic
● Use the data provided to record the signs
○ If observation is greater than M o, record as +.
○ If observation is less than M o, record as –.
○ If observation is equal to M o, ignore it.
● Obtain n: n=n+¿+n −¿ ,¿ ¿ where

■ n+¿=number of observations with positive sign ¿
■ n−¿=number of observations with negative sign ¿
1
● Obtain p: p= ∨0.5
2
■ It is always ½ because the population is assumed to be symmetrical and the
distribution is assumed to be binomial with equal chances of positive and
negative outcomes.
■ q = 1 – p = ½.
● Using the normal approximation to the Binomial distribution. We can say that
n n
R N ( np , np ( 1− p ) )=N ( , ) where
2 4
n n
● np∨ is the mean∧np ( 1− p )∨ is the variance
2 4
● For the two-sided hypothesis we consider r =minimum ¿ and for the one-sided upper tail
hypothesis we consider the number of positives.
● Apply continuity correction.
1
■ P ( R ≥ r ) =P ( R>r− ) one tailed hypothesis
2
1
■ P ( R ≤ r ) =P ( R<r + ) two-tailed hypothesis
2
● Calculate test statistic
r ±0.5−μ n 2 n
■ z= Note that μ= ∧σ =
σ 2 4
Step 4: Decision
Reject H o if
Zα
|Z|> (two-sided hypothesis)
2
Z> Z a (one-sided upper tail hypothesis)
Step 5: Conclusion
Make your conclusion based on your decision.
EXAMPLE 1
A questionnaire used in an assessment is thought to give a median score of 50 in a group doing a
particular course. When tried out on 20 students of another course, it gave the scores: 26, 46, 39, 58, 62,
41, 65, 49, 54, 50, 61, 38, 58, 35, 27, 34, 46, 51, 29, 40. Test the hypothesis that the median is not 50 at
5% level of significance.
SOLUTION
Step 1: Hypothesis
H o : M =50
H 1 : M ≠ 50
α 0.05
α =0.05 → two−sided = =0.025
2 2
Zα
Since it is a two-sided, Reject H o if |Z|≥ =1.96
2
Step 3: Test Statistics
Calculate the n by adding the number of positive signs (greater than the median) and negative
signs (less than the median)
26, 46, 39, 58, 62, 41, 65, 49, 54, 50, 61, 38, 58, 35, 27, 34, 46, 51, 29, 40
Median is 50.
n+¿=¿¿ 7
n−¿=¿¿ 12
n=n+¿+n −¿=7+ 12=19 ¿ ¿
We know that our p = ½ and our q = ½
Solving for r,
r =¿
Solving for R ~ N Note:
R N (np , npq) np = µ and npq =

µ = 9.5,
R N (19 x 0.5 , 19 x 0.5 x 0.5)
R N (9.5 , 4.75)
Apply the continuity correction
1
(
P ( R ≤ r ) =P R<r +
2)=P ( R<7+ 0.5 )=P(R<7.5)
Calculate the test statistic
r ±0.5−μ
z=
σ
7+0.5−9.5
z=
√ 4.75
z=−0.92
Step 4: Decision
Zα
Reject H o if |Z|≥ =1.96
2
Since our Z = -092
|−0.92|=0.92
0.92 <1.96
∴ Fail ¿ reject H o
Step 5: Conclusion
At 5% significance level, fail to reject H o .
∴ Medianis equal ¿50.
EXAMPLE 2
The following are measurements of breaking strength of a certain kind of 2-inch cotton ribbon in pounds:
163 165 160 189 161 171 158 151 169 162
163 169 172 165 148 166 172 163 187 173
Use the sign testing to test the null hypothesis M = 160 against the alternative hypothesis M > 160 at the
0.025 level of significance.
SOLUTION:
Step 1: Hypothesis
H o : M =160
H 1 : M >160
α =0.025 → One−sided test
Reject H o if Z > Z α =Z0.025=1.96
Step 3: Test Statistics

Calculate the n by adding the number of positive signs (greater than the median) and negative
signs (less than the median)
163 165 160 189 161 171 158 151 169 162
163 169 172 165 148 166 172 163 187 173
M = 160
n+¿=¿¿ 15
n−¿=¿¿ 4
n=n+¿+n −¿=15+ 4=19 ¿ ¿
Solving for r,
r =n+¿=15 ¿

µ = 9.5,
R N (19 x 0.5 , 19 x 0.5 x 0.5)
R N (9.5 , 4.75)
1
(
P ( R ≥ r ) =P R>r −
2)=P ( R> 15−0.5 )=P(R<14.5)
r ±0.5−μ
z=
σ
15−0.5−9.5
z=
√ 4.75
z=2.29
Step 4: Decision
Reject H o if Z > Z α =Z0.025=1.96
Since our Z = 2.29
2.29>1.96
∴ Reject H o
Step 5: Conclusion
At 2.5% significance level, we reject H o .
∴ Medianis greater than 160.
THE SIGN TEST (MATCHED PAIRS)
⮚ Suppose samples are taken for comparison from the same population, we have to note that we
have a pair of observations which are related to each other.
⮚ The general procedure is as follows:
Step 1: Hypotheses
Null hypothesis
1
H o : M Diff =0 OR H o : p=
2
Alternative hypotheses:
1
H 1 : M Diff ≠ o OR H1: p ≠ (two-tailed test)
2
1
H 1 : M Diff > o OR H 1 : p> (upper one-tailed test)
2
1
H 1 : M Diff < o OR H 1 : p< (lower one-tailed test)
2
Specify the level of significance, α .
Note: In this module we will only consider the upper tailed and two tailed hypothesis.
Reject H o if
Zα
2
❖ Find the difference between the two samples (data set) and label accordingly
● If the difference is greater than 0 record as +
● If the difference is less than 0 record as –
● If the difference is equal to 0, ignore it.
❖ Obtain n: n=n+¿+n −¿ ,¿ ¿ where
n+¿=number of differences with positive sign ¿
n−¿=number of differences with negative sign ¿
1
❖ Obtain p: p= ∨0.5
2
It is always ½ because the population is assumed to be symmetrical and the distribution is
assumed to be binomial with equal chances of positive and negative outcomes.

q = 1 – p = ½.
❖ Using the normal approximation to the Binomial distribution. We can say that
n n
R N ( np , np ( 1− p ) )=N ( , ) where
2 4
n n
np∨ is themean∧np ( 1− p )∨ is the variance
2 4
❖ For the two-sided hypothesis we consider r =minimum¿ and for one-sided upper tail
hypothesis we consider the number of positives.
❖ Apply continuity correction.
1
P ( R ≥ r ) =P ( R>r− ) one tailed hypothesis, subtract from R if r > n/2
2
1
P ( R ≤ r ) =P ( R<r + ) two-tailed hypothesis, add to R if r < n/2
2
❖ Calculate test statistic
r ±0.5−μ n 2 n
z= Note that μ= ∧σ =
σ 2 4
Step 4: Decision
Reject H o if
Zα
2

Step 5: Conclusion
Make your conclusion based on your decision.
EXAMPLE 1
The assessments for nine patients are shown in the table.
Patient Treatment Treatment
A B
1 36.3 35.1
2 48.4 46.8
3 40.2 37.3
4 54.7 50.6
5 28.7 29.1
6 42.8 41.0
7 36.1 35.3
8 39.0 39.1
9 36 36
Use the sign test to determine whether the data present sufficient evidence to indicate that one of the
1
treatments tends to be consistently more efficient than the other; that is, P ( Y A >Y B ) ≠ . Test by using
2
α =0.05
SOLUTION
Step 1: Hypothesis
1
H o : M Diff =0 OR H o : p=
2
1
H 1 : M Diff ≠ o OR H1: p ≠
2
α
α =0.05 → two−sided =0.025
2
Zα
Since it is a two-sided, Reject H o if |Z|> =1.96
2
Calculate the difference of Treatment A and B of the 9 patients.
Patient Treatment Treatment A-B Sign
A B (+,-)
1 36.3 35.1 1.2 +
2 48.4 46.8 1.6 +
3 40.2 37.3 2.9 +
4 54.7 50.6 4.1 +
5 28.7 29.1 -0.4 -
6 42.8 41.0 1.8 +
7 36.1 35.3 0.8 +
8 39.0 39.1 -0.1 -
9 36 36 0 Ignore
Count the positive and negative signs.
n+¿=6 ¿
n−¿=2 ¿
n=¿
Solving for r,
r =¿

µ = 4,
R N (8 x 0.5 , 8 x 0.5 x 0.5)
R N (4 , 2)
1
(
P ( R ≤ r ) =P R<r +
2)=P ( R<2+0.5 )=P( R<2.5)
r ±0.5−μ
z=
σ
2+0.5−4
z=
√2
z=−1.06
Step 4: Decision
Zα
Reject H o if |Z|≥ =1.96
2
Since our Z = -092
|−1.06|=1.06
1.06 <1.96
∴ Fail ¿ reject H o
Step 5: Conclusion
Treatment A and Treatment B are equally the same.
Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov test is used to test the null hypothesis that a set of data comes from a
Normal distribution. The Kolmogorov Smirnov test produces test statistics that are used (along with a
degrees of freedom parameter) to test for normality.
Null Hypothesis - A null hypothesis is a statement, in which there is no relationship between two variables.
What does the Kolmogorov-Smirnov test show?
The two sample Kolmogorov-Smirnov test is a nonparametric test that compares the cumulative
distributions of two data sets(1,2). ... The KS test report the maximum difference between the two
cumulative distributions, and calculates a P value from that and the sample sizes
SPSS Kolmogorov-Smirnov Test for Normality
What is a Kolmogorov-Smirnov normality test?
The Kolmogorov-Smirnov test examines if scores are likely to follow some distribution in some
population. For avoiding confusion, there's 2 Kolmogorov-Smirnov tests:

● there's the one sample Kolmogorov-Smirnov test for testing if a variable follows a
given distribution in a population. This “given distribution” is usually -not always-
the normal distribution, hence “Kolmogorov-Smirnov normality test”.
● there's also the (much less common) independent samples Kolmogorov-Smirnov
test for testing if a variable has identical distributions in 2 populations.
In theory, “Kolmogorov-Smirnov test” could refer to either test (but usually refers to the one-
sample Kolmogorov-Smirnov test) and had better be avoided. By the way, both Kolmogorov-
Smirnov tests are present in SPSS.
Kolmogorov-Smirnov Test - Simple Example
So say I've a population of 1,000,000 people. I think their reaction times on some tasks
are perfectly normally distributed. I sample 233 of these people and measure their reaction times.
Now the observed frequency distribution of these will probably differ a bit -but not too
much- from a normal distribution. So I run a histogram over observed reaction times and
superimpose a normal distribution with the same mean and standard deviation. The result is
shown below.
The frequency distribution of my scores doesn't entirely overlap with my normal curve.
Now, I could calculate the percentage of cases that deviate from the normal curve -the
percentage of red areas in the chart. This percentage is a test statistic: it expresses in a single
number how much my data differ from my null hypothesis. So it indicates to what extent the
observed scores deviate from a normal distribution.
Now, if my null hypothesis is true, then this deviation percentage should probably be quite
small. That is, a small deviation has a high probability value or p-value.
Reversely, a huge deviation percentage is very unlikely and suggests that my reaction times don't
follow a normal distribution in the entire population. So a large deviation has a low p-value. As
a rule of thumb, we reject the null hypothesis if p < 0.05.So if p < 0.05, we don't believe that
our variable follows a normal distribution in our population.

Kolmogorov-Smirnov Test - Test Statistic
So that's the easiest way to understand how the Kolmogorov-Smirnov normality test
works. Computationally, however, it works differently: it compares the observed versus the
expected cumulative relative frequencies as shown below.\
The Kolmogorov-Smirnov test uses the maximal absolute difference between these
curves as its test statistic denoted by D. In this chart, the maximal absolute difference D is (0.48 -
0.41 =) 0.07 and it occurs at a reaction time of 960 milliseconds. Keep in mind that D = 0.07 as
we'll encounter it in our SPSS output in a minute.
Jarque-Bera Test
In statistics, the Jarque–Bera test is a goodness-of-fit test, it is used to test if sample data
fits a distribution from a certain population of whether sample data have
the skewness and kurtosis matching a normal distribution.
This test is named after Carlos Jarque a Mexican economist, currently Executive Director
of America Movil and Board Member and Anil K. Bera an Indian econometrician. He is
Professor of Economics at University of Illinois at Urbana–Champaign's Department of
Economics. They derived it while working on their Ph.D. Thesis at the Australian National
University.
The Jarque-Bera Test, is a test for normality. Normality is one of the assumptions for many
statistical tests, like the t test or F test; the Jarque-Bera test is usually run before one of these tests
to confirm normality. It is usually used for large data sets, because other normality tests are not
reliable when n is large.
The data could take many forms, including:
● Time Series Data.
● Errors in a regression model.
● Data in a Vector.
A normal distribution has a skew of zero (i.e. it’s perfectly symmetrical around the mean) and a
kurtosis of three; kurtosis tells you how much data is in the tails and gives you an idea about how
“peaked” the distribution is. It’s not necessary to know the mean or the standard deviation for the
data in order to run the test. This test statistic is always positive, and if it is not close to zero, it
shows that the sample data do not have a normal distribution.

The formula for the Jarque-Bera test statistic (usually shortened to just JB test statistic) is:
(
[
JB=¿ n √❑
❑
)
]
where :
√❑
b 2=isthe kurtosis coefficient .
n=sample ¿ ¿
Formula For getting skewness and Kurtosis
n
1 3
∑ ( x i−x )
n
skewness= i=1
❑
n
1 4
∑ ( xi −x )
n
kurtosis= i =1
❑
If JB > x 2 ( a ,2 ) , then the null hypothesis is rejected, meaning the data is not normally distributed.
where a is significant level.
The null hypothesis for the test is that the data is normally distributed. The alternate hypothesis is
that the data does not come from a normal distribution.

Properties of the Skewness measure:
1 Zero skewness implies a symmetric distribution (the Normal, t-distribution)
2 Positive skewness means that the distribution has a long right tail, it's skewed to the right.
3 Negative skewness means that the distribution has a long left tail, it is skewed to the left.
Properties of the Kurtosis measure:
1 A distribution with kurtosis=3 is said to be mesokurtic .
2 A distribution with kurtosis>3 is said to be leptokurtic or fat-tailed.
For example: Stock returns are known to be leptokurtic, i.e more “peaked” and fat-tailed than the
normal distribution. A distribution with kurtosis <is said to be platykurtic
The Jarque-Bera test uses these two (statistical) properties of the normal
distribution, namely:
The Normal distribution is symmetric around its mean (skewness = zero)
The Normal distribution has kurtosis three, or Excess kurtosis = zero
How to do a Jarque-Bera test in practice
1 Calculate the skewness in the sample.
2 Calculate the kurtosis in the sample.
3 Calculate the Jarque-Bera test statistic
4 Compare the Jarque-Bera test statistic with the critical values in the
chi-square table, 2 df. (Dwight, 2019)
Example:
n
1 3 −51.4
∑ ( x i−x ) = 3
=−0.2212
n i=1 2
( 37.8 )
skewness=
❑
n
1
∑ ( x −x )4= 4,203.8
n i =1 i ( 37.8 )2
=2.9421
kurtosis=
❑
(
[
JB=n √ ❑
❑
)
]
x 2 ( 0.005 ; 2 )=5.99
Null hypothesis is not rejected because JB < x 2 ( a ,2 )
What the Results Mean
In general, a large J-B value indicates that errors are not normally distributed.

For example, in MATLAB, a result of 1 means that the null hypothesis has been rejected at the
5% significance level. In other words, the data does not come from a normal distribution. A
value of 0 indicates the data is normally distributed.
Unfortunately, most statistical software does not support this test. In order to interpret results,
you may need to do a little comparison (and so you should be intimately familiar with hypothesis
testing). Checking p-values is always a good idea. For example, a tiny p-value and a large chi-
square value from this test means that you can reject the null hypothesis that the data is normally
distributed.
If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared
distribution with two degrees of freedom, so the statistic can be used to test the hypothesis that
the data are from a normal distribution. The null hypothesis is a joint hypothesis of the skewness
being zero and the excess kurtosis being zero. Samples from a normal distribution have an
expected skewness of 0 and an expected excess kurtosis of 0 (which is the same as a kurtosis of
3). As the definition of JB shows, any deviation from this increases the JB statistic.
Steps For Jarque-Bera test using excel
Step 1: Input the data. First, input the dataset into one column
Step 2: Calculate the Jarque-Bera Test Statistic.
Next, calculate the JB test statistic. Get first the
Skewness = =SKEW(A2:A16)
Kurtosis = =KURT(A2:A16)
JB test Statistics = =(C2/6)*(C3^2+(C4^2/40))
Score
10 n(Sample size) 15
8 S (Skewness) 0.118547
9 C (kurtosis) -0.76348
5 jb (test Statistics) 0.071565
7
6
6
9
6
6
8
5
7
8
4
Step 3: Calculate the p-value of the test.
Recall that under the null hypothesis of normality, the test statistic JB follows a Chi-Square
distribution with 2 degrees of freedom. Thus, to find the p-value for the test we will use the
following function in
Excel: =CHISQ.DIST.RT(JB test statistic, 2)
The p-value of the test is 0.96485. Since this p-value is greater than 0.05, we succeed in rejecting
the null hypothesis. We have enough evidence to say that the dataset is not normally distributed.
Score
10 n 15
8 s 0.118547
9 c -0.76348
5 jb 0.071565
7
6 p-value 0.96485
6
9
6
6
8
5
7
8
4
Reference:
https://www.spss-tutorials.com/spss-kolmogorov-smirnov-test-for-normality
https://www.graphpad.com/guides/prism/latest/statistics/interpreting_results_kolmogorov-
smirnov_test.htm
https://keydifferences.com/difference-between-null-and-alternative-hypothesis.html
https://collinsdwight.medium.com/jarque-bera-test-of-normality-a108a1515b22
https://www.statisticshowto.com/jarque-bera-test
https://digensia.wordpress.com/2012/05/07/the-jarque-bera-test-for-normality-testing
https://www.r-bloggers.com
https://www.statology.org/jarque-bera-test-excel
https://www.youtube.com/watch?v=ZbjsXS8oKfo
https://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/manual/v2appendixc.pdf
https://www.sagepub.com/sites/default/files/upm-binaries/40007_Chapter8.pdf
https://www.youtube.com/watch?v=ztmua4TrLLM

Advanced Statistics: Sign Test, Kolmogorov Test and Jarque-Bera Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Statistics: Sign Test, Kolmogorov Test and Jarque-Bera Test

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF MAKATI

J.P Rizal Extension, Brgy. West Rembo, Makati City

Sign Test, Kolmogorov Test and

De Leon, John Vincent

Sabatin, June Evan M.

not require that the data fit a normal distribution.

be used to test a population median against a hypothesized value.

● The Sign Test (One Sample Problem)

● The Sign Test (Matched Pairs)

THE SIGN TEST (ONE SAMPLE PROBLEM)

Specify the level of significance, α .

Step 2: Rejection Criteria

Z cal> Z a (one-sided upper tail hypothesis)

Step 3: Test Statistic

● Use the data provided to record the signs

○ If observation is greater than M o, record as +.

○ If observation is less than M o, record as –.

○ If observation is equal to M o, ignore it.

● Obtain n: n=n+¿+n −¿ ,¿ ¿ where

■ n−¿=number of observations with negative sign ¿

■ It is always ½ because the population is assumed to be symmetrical and the

distribution is assumed to be binomial with equal chances of positive and

hypothesis we consider the number of positives.

● Apply continuity correction.

● Calculate test statistic

Make your conclusion based on your decision.

A questionnaire used in an assessment is thought to give a median score of 50 in a group doing a

Step 2: Rejection Criteria

Step 3: Test Statistics

signs (less than the median)

n=n+¿+n −¿=7+ 12=19 ¿ ¿

We know that our p = ½ and our q = ½

Solving for R ~ N Note:

R N (np , npq) np = µ and npq =

Apply the continuity correction

Calculate the test statistic

Since our Z = -092

At 5% significance level, fail to reject H o .

∴ Medianis equal ¿50.

0.025 level of significance.

α =0.025 → One−sided test

Step 2: Rejection Criteria

Reject H o if Z > Z α =Z0.025=1.96

Step 3: Test Statistics

signs (less than the median)

n=n+¿+n −¿=15+ 4=19 ¿ ¿

We know that our p = ½ and our q = ½

Solving for R ~ N Note:

R N (np , npq) np = µ and npq =

Apply the continuity correction

Calculate the test statistic

Reject H o if Z > Z α =Z0.025=1.96

Since our Z = 2.29

At 2.5% significance level, we reject H o .

∴ Medianis greater than 160.

THE SIGN TEST (MATCHED PAIRS)

have a pair of observations which are related to each other.

⮚ The general procedure is as follows:

Specify the level of significance, α .

Step 2: Rejection Criteria

Z> Z a (one-sided upper tail hypothesis)

Step 3: Test Statistic

● If the difference is greater than 0 record as +