You are on page 1of 27

UNIVERSITY OF MAKATI

J.P Rizal Extension, Brgy. West Rembo, Makati City

Advanced Statistics:

Sign Test, Kolmogorov Test and

Jarque-Bera Test

By:

De Leon, John Vincent

Sabatin, June Evan M.

Vitug, Arianne

III-MATH

SIGN TEST
Many of the hypothesis tests studied so far have imposed one or more requirements for a popular

distribution – such as the population being normal or the variances being equal. A nonparametric test is

a hypothesis test that does not require any specific conditions concerning the shape of the populations or

the value of any population parameters. It is sometimes called distribution free statistics because they do

not require that the data fit a normal distribution.

Another important reason for using these tests is that they allow for the analysis of categorical as

well as rank data. It is widely used for studying populations that take on a ranked order such as movie

review that receives one to four stars. Nonparametric tests are usually easier to perform than parametric

tests. However, they are usually less efficient than parametric tests.

One of the easiest nonparametric tests to perform is the sign test. It is a nonparametric test that can

be used to test a population median against a hypothesized value.

● The Sign Test (One Sample Problem)

● The Sign Test (Matched Pairs)

THE SIGN TEST (ONE SAMPLE PROBLEM)

This is the quickest and simplest nonparametric method. In this case, we consider testing samples from

the same population. In the sign test, we are testing the hypothesis on the median (M) rather than the

mean

The general steps of hypothesis testing can still be followed. The procedural steps to be followed are as

follows:

Step 1: Hypothesis
Null hypothesis:

1
H o : M =M 0 OR H o : p=
2

Alternative hypothesis:

1
H1: M ≠ M o OR H1: p ≠ (two tailed test)
2

1
H1: M > M o OR H 1 : p> (upper one-tailed test)
2

1
H1: M < M o OR H 1 : p< (lower one-tailed test)
2

Specify the level of significance, α .

Note: In this module we will only consider the upper tailed and two tailed hypothesis.

Step 2: Rejection Criteria


Reject H o if |Z|> (two-sided hypothesis) OR
2

Z cal> Z a (one-sided upper tail hypothesis)

Step 3: Test Statistic

● Use the data provided to record the signs

○ If observation is greater than M o, record as +.

○ If observation is less than M o, record as –.

○ If observation is equal to M o, ignore it.

● Obtain n: n=n+¿+n −¿ ,¿ ¿ where


■ n+¿=number of observations with positive sign ¿

■ n−¿=number of observations with negative sign ¿

1
● Obtain p: p= ∨0.5
2

■ It is always ½ because the population is assumed to be symmetrical and the

distribution is assumed to be binomial with equal chances of positive and

negative outcomes.

■ q = 1 – p = ½.

● Using the normal approximation to the Binomial distribution. We can say that

n n
R N ( np , np ( 1− p ) )=N ( , ) where
2 4

n n
● np∨ is the mean∧np ( 1− p )∨ is the variance
2 4

● For the two-sided hypothesis we consider r =minimum ¿ and for the one-sided upper tail

hypothesis we consider the number of positives.

● Apply continuity correction.

1
■ P ( R ≥ r ) =P ( R>r− ) one tailed hypothesis
2

1
■ P ( R ≤ r ) =P ( R<r + ) two-tailed hypothesis
2

● Calculate test statistic

r ±0.5−μ n 2 n
■ z= Note that μ= ∧σ =
σ 2 4

Step 4: Decision

Reject H o if


|Z|> (two-sided hypothesis)
2
Z> Z a (one-sided upper tail hypothesis)

Step 5: Conclusion

Make your conclusion based on your decision.

EXAMPLE 1

A questionnaire used in an assessment is thought to give a median score of 50 in a group doing a

particular course. When tried out on 20 students of another course, it gave the scores: 26, 46, 39, 58, 62,

41, 65, 49, 54, 50, 61, 38, 58, 35, 27, 34, 46, 51, 29, 40. Test the hypothesis that the median is not 50 at

5% level of significance.

SOLUTION

Step 1: Hypothesis

H o : M =50

H 1 : M ≠ 50

α 0.05
α =0.05 → two−sided = =0.025
2 2

Step 2: Rejection Criteria


Since it is a two-sided, Reject H o if |Z|≥ =1.96
2

Step 3: Test Statistics

Calculate the n by adding the number of positive signs (greater than the median) and negative

signs (less than the median)

26, 46, 39, 58, 62, 41, 65, 49, 54, 50, 61, 38, 58, 35, 27, 34, 46, 51, 29, 40

Median is 50.
n+¿=¿¿ 7

n−¿=¿¿ 12

n=n+¿+n −¿=7+ 12=19 ¿ ¿

We know that our p = ½ and our q = ½

Solving for r,

r =¿

Solving for R ~ N Note:

R N (np , npq) np = µ and npq =


µ = 9.5,
R N (19 x 0.5 , 19 x 0.5 x 0.5)

R N (9.5 , 4.75)

Apply the continuity correction

1
(
P ( R ≤ r ) =P R<r +
2)=P ( R<7+ 0.5 )=P(R<7.5)

Calculate the test statistic

r ±0.5−μ
z=
σ

7+0.5−9.5
z=
√ 4.75

z=−0.92

Step 4: Decision


Reject H o if |Z|≥ =1.96
2

Since our Z = -092

|−0.92|=0.92
0.92 <1.96

∴ Fail ¿ reject H o

Step 5: Conclusion

At 5% significance level, fail to reject H o .

∴ Medianis equal ¿50.

EXAMPLE 2

The following are measurements of breaking strength of a certain kind of 2-inch cotton ribbon in pounds:

163 165 160 189 161 171 158 151 169 162

163 169 172 165 148 166 172 163 187 173

Use the sign testing to test the null hypothesis M = 160 against the alternative hypothesis M > 160 at the

0.025 level of significance.

SOLUTION:

Step 1: Hypothesis

H o : M =160

H 1 : M >160

α =0.025 → One−sided test

Step 2: Rejection Criteria

Reject H o if Z > Z α =Z0.025=1.96

Step 3: Test Statistics


Calculate the n by adding the number of positive signs (greater than the median) and negative

signs (less than the median)

163 165 160 189 161 171 158 151 169 162

163 169 172 165 148 166 172 163 187 173

M = 160

n+¿=¿¿ 15

n−¿=¿¿ 4

n=n+¿+n −¿=15+ 4=19 ¿ ¿

We know that our p = ½ and our q = ½

Solving for r,

r =n+¿=15 ¿

Solving for R ~ N Note:

R N (np , npq) np = µ and npq =


µ = 9.5,
R N (19 x 0.5 , 19 x 0.5 x 0.5)

R N (9.5 , 4.75)

Apply the continuity correction

1
(
P ( R ≥ r ) =P R>r −
2)=P ( R> 15−0.5 )=P(R<14.5)

Calculate the test statistic

r ±0.5−μ
z=
σ

15−0.5−9.5
z=
√ 4.75
z=2.29

Step 4: Decision

Reject H o if Z > Z α =Z0.025=1.96

Since our Z = 2.29

2.29>1.96

∴ Reject H o

Step 5: Conclusion

At 2.5% significance level, we reject H o .

∴ Medianis greater than 160.

THE SIGN TEST (MATCHED PAIRS)

⮚ Suppose samples are taken for comparison from the same population, we have to note that we

have a pair of observations which are related to each other.

⮚ The general procedure is as follows:

Step 1: Hypotheses

Null hypothesis

1
H o : M Diff =0 OR H o : p=
2

Alternative hypotheses:

1
H 1 : M Diff ≠ o OR H1: p ≠ (two-tailed test)
2
1
H 1 : M Diff > o OR H 1 : p> (upper one-tailed test)
2

1
H 1 : M Diff < o OR H 1 : p< (lower one-tailed test)
2

Specify the level of significance, α .

Note: In this module we will only consider the upper tailed and two tailed hypothesis.

Step 2: Rejection Criteria

Reject H o if


|Z|> (two-sided hypothesis)
2

Z> Z a (one-sided upper tail hypothesis)

Step 3: Test Statistic

❖ Find the difference between the two samples (data set) and label accordingly

● If the difference is greater than 0 record as +

● If the difference is less than 0 record as –

● If the difference is equal to 0, ignore it.

❖ Obtain n: n=n+¿+n −¿ ,¿ ¿ where

n+¿=number of differences with positive sign ¿

n−¿=number of differences with negative sign ¿

1
❖ Obtain p: p= ∨0.5
2

It is always ½ because the population is assumed to be symmetrical and the distribution is

assumed to be binomial with equal chances of positive and negative outcomes.


q = 1 – p = ½.

❖ Using the normal approximation to the Binomial distribution. We can say that

n n
R N ( np , np ( 1− p ) )=N ( , ) where
2 4

n n
np∨ is themean∧np ( 1− p )∨ is the variance
2 4

❖ For the two-sided hypothesis we consider r =minimum¿ and for one-sided upper tail

hypothesis we consider the number of positives.

❖ Apply continuity correction.

1
P ( R ≥ r ) =P ( R>r− ) one tailed hypothesis, subtract from R if r > n/2
2

1
P ( R ≤ r ) =P ( R<r + ) two-tailed hypothesis, add to R if r < n/2
2

❖ Calculate test statistic

r ±0.5−μ n 2 n
z= Note that μ= ∧σ =
σ 2 4

Step 4: Decision

Reject H o if


|Z|> (two-sided hypothesis)
2

Z> Z a (one-sided upper tail hypothesis)


Step 5: Conclusion

Make your conclusion based on your decision.

EXAMPLE 1

The assessments for nine patients are shown in the table.

Patient Treatment Treatment

A B
1 36.3 35.1
2 48.4 46.8
3 40.2 37.3
4 54.7 50.6
5 28.7 29.1
6 42.8 41.0
7 36.1 35.3
8 39.0 39.1
9 36 36

Use the sign test to determine whether the data present sufficient evidence to indicate that one of the

1
treatments tends to be consistently more efficient than the other; that is, P ( Y A >Y B ) ≠ . Test by using
2

α =0.05

SOLUTION

Step 1: Hypothesis
1
H o : M Diff =0 OR H o : p=
2

1
H 1 : M Diff ≠ o OR H1: p ≠
2

α
α =0.05 → two−sided =0.025
2

Step 2: Rejection Criteria


Since it is a two-sided, Reject H o if |Z|> =1.96
2

Step 3: Test Statistic

Calculate the difference of Treatment A and B of the 9 patients.

Patient Treatment Treatment A-B Sign

A B (+,-)
1 36.3 35.1 1.2 +
2 48.4 46.8 1.6 +
3 40.2 37.3 2.9 +
4 54.7 50.6 4.1 +
5 28.7 29.1 -0.4 -
6 42.8 41.0 1.8 +
7 36.1 35.3 0.8 +
8 39.0 39.1 -0.1 -
9 36 36 0 Ignore

Count the positive and negative signs.

n+¿=6 ¿

n−¿=2 ¿
n=¿

We know that our p = ½ and our q = ½

Solving for r,

r =¿

Solving for R ~ N Note:

R N (np , npq) np = µ and npq =


µ = 4,
R N (8 x 0.5 , 8 x 0.5 x 0.5)

R N (4 , 2)

Apply the continuity correction

1
(
P ( R ≤ r ) =P R<r +
2)=P ( R<2+0.5 )=P( R<2.5)

Calculate the test statistic

r ±0.5−μ
z=
σ

2+0.5−4
z=
√2

z=−1.06

Step 4: Decision


Reject H o if |Z|≥ =1.96
2

Since our Z = -092

|−1.06|=1.06

1.06 <1.96

∴ Fail ¿ reject H o
Step 5: Conclusion

Treatment A and Treatment B are equally the same.

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov test is used to test the null hypothesis that a set of data comes from a

Normal distribution. The Kolmogorov Smirnov test produces test statistics that are used (along with a

degrees of freedom parameter) to test for normality.

Null Hypothesis - A null hypothesis is a statement, in which there is no relationship between two variables. 

What does the Kolmogorov-Smirnov test show?

The two sample Kolmogorov-Smirnov test is a nonparametric test that compares the cumulative

distributions of two data sets(1,2). ... The KS test report the maximum difference between the two

cumulative distributions, and calculates a P value from that and the sample sizes
SPSS Kolmogorov-Smirnov Test for Normality

What is a Kolmogorov-Smirnov normality test?

The Kolmogorov-Smirnov test examines if scores are likely to follow some distribution in some

population. For avoiding confusion, there's 2 Kolmogorov-Smirnov tests:


● there's the one sample Kolmogorov-Smirnov test for testing if a variable follows a

given distribution in a population. This “given distribution” is usually -not always-

the normal distribution, hence “Kolmogorov-Smirnov normality test”.

● there's also the (much less common) independent samples Kolmogorov-Smirnov

test for testing if a variable has identical distributions in 2 populations.

In theory, “Kolmogorov-Smirnov test” could refer to either test (but usually refers to the one-

sample Kolmogorov-Smirnov test) and had better be avoided. By the way, both Kolmogorov-

Smirnov tests are present in SPSS.

Kolmogorov-Smirnov Test - Simple Example

So say I've a population of 1,000,000 people. I think their reaction times on some tasks

are perfectly normally distributed. I sample 233 of these people and measure their reaction times.

Now the observed frequency distribution of these will probably differ a bit -but not too

much- from a normal distribution. So I run a histogram over observed reaction times and

superimpose a normal distribution with the same mean and standard deviation. The result is

shown below.
The frequency distribution of my scores doesn't entirely overlap with my normal curve.

Now, I could calculate the percentage of cases that deviate from the normal curve -the

percentage of red areas in the chart. This percentage is a test statistic: it expresses in a single

number how much my data differ from my null hypothesis. So it indicates to what extent the

observed scores deviate from a normal distribution.

Now, if my null hypothesis is true, then this deviation percentage should probably be quite

small. That is, a small deviation has a high probability value or p-value.

Reversely, a huge deviation percentage is very unlikely and suggests that my reaction times don't

follow a normal distribution in the entire population. So a large deviation has a low p-value. As

a rule of thumb, we reject the null hypothesis if p < 0.05.So if p < 0.05, we don't believe that

our variable follows a normal distribution in our population.


Kolmogorov-Smirnov Test - Test Statistic

So that's the easiest way to understand how the Kolmogorov-Smirnov normality test

works. Computationally, however, it works differently: it compares the observed versus the

expected cumulative relative frequencies as shown below.\

The Kolmogorov-Smirnov test uses the maximal absolute difference between these

curves as its test statistic denoted by D. In this chart, the maximal absolute difference D is (0.48 -

0.41 =) 0.07 and it occurs at a reaction time of 960 milliseconds. Keep in mind that D = 0.07 as

we'll encounter it in our SPSS output in a minute.

Jarque-Bera Test
In statistics, the Jarque–Bera test is a goodness-of-fit test, it is used to test if sample data

fits a distribution from a certain population of whether sample data have

the skewness and kurtosis matching a normal distribution.

This test is named after Carlos Jarque a Mexican economist, currently Executive Director

of America Movil and Board Member and Anil K. Bera an Indian econometrician. He is

Professor of Economics at University of Illinois at Urbana–Champaign's Department of

Economics. They derived it  while working on their Ph.D. Thesis at the Australian National

University.

The Jarque-Bera Test, is a test for normality. Normality is one of the assumptions for many

statistical tests, like the t test or F test; the Jarque-Bera test is usually run before one of these tests

to confirm normality. It is usually used for large data sets, because other normality tests are not

reliable when n is large.

The data could take many forms, including:

● Time Series Data.

● Errors in a regression model.

● Data in a Vector.

A normal distribution has a skew of zero (i.e. it’s perfectly symmetrical around the mean) and a

kurtosis of three; kurtosis tells you how much data is in the tails and gives you an idea about how

“peaked” the distribution is. It’s not necessary to know the mean or the standard deviation for the

data in order to run the test. This test statistic is always positive, and if it is not close to zero, it

shows that the sample data do not have a normal distribution.


The formula for the Jarque-Bera test statistic (usually shortened to just JB test statistic) is:

(
[
JB=¿ n √❑

)
]
where :

√❑

b 2=isthe kurtosis coefficient .

n=sample ¿ ¿

Formula For getting skewness and Kurtosis

n
1 3
∑ ( x i−x )
n
skewness= i=1

n
1 4
∑ ( xi −x )
n
kurtosis= i =1

If JB > x 2 ( a ,2 ) , then the null hypothesis is rejected, meaning the data is not normally distributed.

where a is significant level.

The null hypothesis for the test is that the data is normally distributed. The alternate hypothesis is

that the data does not come from a normal distribution.


Properties of the Skewness measure:

1 Zero skewness implies a symmetric distribution (the Normal, t-distribution)

2 Positive skewness means that the distribution has a long right tail, it's skewed to the right.

3 Negative skewness means that the distribution has a long left tail, it is skewed to the left.

Properties of the Kurtosis measure:

1 A distribution with kurtosis=3 is said to be mesokurtic .

2 A distribution with kurtosis>3 is said to be leptokurtic or fat-tailed.

For example: Stock returns are known to be leptokurtic, i.e more “peaked” and fat-tailed than the

normal distribution. A distribution with kurtosis <is said to be platykurtic

The Jarque-Bera test uses these two (statistical) properties of the normal

distribution, namely:

The Normal distribution is symmetric around its mean (skewness = zero)

The Normal distribution has kurtosis three, or Excess kurtosis = zero

How to do a Jarque-Bera test in practice

1 Calculate the skewness in the sample.

2 Calculate the kurtosis in the sample.

3 Calculate the Jarque-Bera test statistic

4 Compare the Jarque-Bera test statistic with the critical values in the

chi-square table, 2 df. (Dwight, 2019)

Example:
n
1 3 −51.4
∑ ( x i−x ) = 3
=−0.2212
n i=1 2
( 37.8 )
skewness=

n
1
∑ ( x −x )4= 4,203.8
n i =1 i ( 37.8 )2
=2.9421
kurtosis=

(
[
JB=n √ ❑

)
]
x 2 ( 0.005 ; 2 )=5.99

Null hypothesis is not rejected because JB < x 2 ( a ,2 )

What the Results Mean

In general, a large J-B value indicates that errors are not normally distributed.


For example, in MATLAB, a result of 1 means that the null hypothesis has been rejected at the

5% significance level. In other words, the data does not come from a normal distribution. A

value of 0 indicates the data is normally distributed.

Unfortunately, most statistical software does not support this test. In order to interpret results,

you may need to do a little comparison (and so you should be intimately familiar with hypothesis

testing). Checking p-values is always a good idea. For example, a tiny p-value and a large chi-

square value from this test means that you can reject the null hypothesis that the data is normally

distributed.

If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared

distribution with two degrees of freedom, so the statistic can be used to test the hypothesis that

the data are from a normal distribution. The null hypothesis is a joint hypothesis of the skewness

being zero and the excess kurtosis being zero. Samples from a normal distribution have an

expected skewness of 0 and an expected excess kurtosis of 0 (which is the same as a kurtosis of

3). As the definition of JB shows, any deviation from this increases the JB statistic.

Steps For Jarque-Bera test using excel

Step 1: Input the data. First, input the dataset into one column

Step 2: Calculate the Jarque-Bera Test Statistic.

Next, calculate the JB test statistic. Get first the

Skewness = =SKEW(A2:A16)

Kurtosis = =KURT(A2:A16)
JB test Statistics = =(C2/6)*(C3^2+(C4^2/40))

Score
10 n(Sample size) 15
8 S (Skewness) 0.118547
9 C (kurtosis) -0.76348
5 jb (test Statistics) 0.071565
7
6
6
9
6
6
8
5
7
8
4

Step 3: Calculate the p-value of the test.

Recall that under the null hypothesis of normality, the test statistic JB follows a Chi-Square

distribution with 2 degrees of freedom. Thus, to find the p-value for the test we will use the

following function in

Excel: =CHISQ.DIST.RT(JB test statistic, 2)

The p-value of the test is 0.96485. Since this p-value is greater than 0.05, we succeed in rejecting

the null hypothesis. We have enough evidence to say that the dataset is not normally distributed.

Score
10 n 15
8 s 0.118547
9 c -0.76348
5 jb 0.071565
7
6 p-value 0.96485
6
9
6
6
8
5
7
8
4

Reference:

https://www.spss-tutorials.com/spss-kolmogorov-smirnov-test-for-normality

https://www.graphpad.com/guides/prism/latest/statistics/interpreting_results_kolmogorov-

smirnov_test.htm

https://keydifferences.com/difference-between-null-and-alternative-hypothesis.html
https://collinsdwight.medium.com/jarque-bera-test-of-normality-a108a1515b22

https://www.statisticshowto.com/jarque-bera-test

https://digensia.wordpress.com/2012/05/07/the-jarque-bera-test-for-normality-testing

https://www.r-bloggers.com

https://www.statology.org/jarque-bera-test-excel

https://www.youtube.com/watch?v=ZbjsXS8oKfo

https://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/manual/v2appendixc.pdf

https://www.sagepub.com/sites/default/files/upm-binaries/40007_Chapter8.pdf

https://www.youtube.com/watch?v=ztmua4TrLLM

You might also like