A chi-square test (also chi-squared or χ2 test) is any statistical hypothesis test

in which the sampling distribution of the test statistic is a chi-square distribution

when the null hypothesis is true, or any in which this is asymptotically true,
meaning that the sampling distribution (if the null hypothesis is true) can be made
to approximate a chi-square distribution as closely as desired by making the sample
size large enough.

There are basically two types of random variables and they yield two types of data:
numerical and categorical. A chi square (X ) statistic is used to investigate whether

distributions of categorical variables differ from one another. Basically categorical

variable yield data in the categories and numerical variables yield data in numerical

The use of the chi-square test can be illustrated by using hypothetical data from a study
investigating the association between smoking and asthma among adults observed in a
community health clinic. The results obtained from classifying 150 individuals are shown in
Table 3. As Table 3 shows, among asthmatics the proportion of smokers was 40 percent (20/50),
while the corresponding proportion among asymptomatic individuals was 22 percent (22/100).
By applying the formula presented in Table 2, for the observed cell counts of 20, 30, 22, and 78
(Table 3) the corresponding expected counts are 14, 36, 28, and 72. The observed and expected
counts can then be used to calculate the chi-square test statistic as outlined in Equation 1. The
resulting value of the chi-square
Table 3
Hypothetical data showing chi-square test
Ever smoke cigarettes
Symptoms of asthma Total
Yes No
Yes 20 30 50
No 22 30 100
Total 42 108 150
test statistic is approximately 5.36, and the associated p-value for this chi-square distribution that
has one degree of freedom is 0.02. Therefore, if there was truly no association between smoking
and asthma, there is a 2 out of 100 probability of observing a difference in proportions that is at
least as large as 18 percent (40%–22%) by chance alone. We would therefore conclude that the
observed difference in the proportions is unlikely to be explained by chance alone, and consider
this result statistically significant.

In statistics, we often want to know if the means of two populations are
equal. For example, do men and women earn equal wages on average? This
is an easy thing to test using a two-sample t-test for the equality of means.
The problem with that test is we cannot deal with more than two
populations. What if we want to know whether Blacks, Latinos and Whites
earn the same wages on average? To answer a question like this, we need to
use ANOVA. ANOVA means analysis of variance.
Of course, variance is a measure of dispersion, not central tendency (like
the mean). So, why do we analyze the variance in order to test to see if the
means of three or more groups are equal? Remember, sample means will
differ for two reasons. One, due to random sampling error, we cannot expect
multiple sample means to be exactly equal even if the groups really do have
the same population means. So, if the sample means differ only because of
mere sampling error, we expect those sample means to be "pretty close." If
they are "not very close", then we would conclude that the populations
means really probably are different. Thus, the variance in the sample means
will provide a way of testing whether the sample means are "close enough"
or not. If the variance between the groups is relatively small, then we
conclude that the sample means are equal. If the variance between the
groups is large, we will conclude they are not equal.

One sample t test dipergunakan untuk melihat apakah terdapat perbedaan antara suatu distribusi
dengan nilai tertentu. Contoh kasus penggunaanya adalah, misalnya:
1. Sebuah perusahaan penggaris ingin melihat apakah penggaris yang diproduksi pada hari
tertentu (dalam jumlah besar) sesuai dengan standar yang ditetapkan, yaitu 30 cm.
2. Investor ingin melihat apakah terjadi abnormal return pada sekitar tanggal terjadinya
pengumuman laporan keuangan.
3. Dinas Sosial ingin melihat apakah pekerja di suatu daerah sudah berada di atas UMR atau
Dalam contoh kasus #1 misalnya perusahaan memproduksi 10.000 penggaris dalam satu hari,
maka sangat tidak efektif jika perusahaan tersebut mengukur seluruh penggaris untuk melihat
apakah sesuai standar 30 cm atau belum. Perusahaan bisa menerapkan sampling, misalnya
mengambil 100 penggaris dari masing-masing kemasan kemudian mengukurnya. Dengan
menggunakan one sample t test, maka perusahaan dapat melihat apakah produksi penggaris
sudah sesuai dengan 30 cm dengan toleransi 5% atau belum.

Pada contoh #2 investor menghitung abnormal dari beberapa perusahaan yang dijadikan sampel.
Lalu sampel tersebut diuji dengan one sample t test, apakah berbeda dengan nol atau tidak. Jika
berbeda berarti terdapat abnormal return, akan tetapi jika tidak berbeda dengan nol berarti tidak
terdapat abnormal return di sekitar tanggal pengumuman laporan keuangan.

Dinas sosial pada contoh #3 mengambil sampel pekerja di suatu daerah (kalau para pegawainya
mau lho) lalu mencatat gaji mereka masing-masing. Distribusi data tersebut kemudian diuji
dengan one sample t test untuk melihat apakah berbeda dengan UMR atau tidak.

Asumsi yang harus dipenuhi dalam one sample t test adalah asumsi normalitas. Pengukuran
normalitas sudah diuraikan pada naskah dengan label normalitas pada blog ini. Jika distribusi
data tidak normal maka ada beberapa yang dapat dilakukan, tergantung adjustment dari peneliti,
yaitu menambah jumlah data agar menjadi normal, mentransformasikan data sehingga memenuhi
asumsi normalitas, atau dapat menggunakan uji statistik non parametrik yang tidak memerlukan
asumsi normalitas.


Problem: Sam Sleepresearcher hypothesizes that people who are allowed to sleep for only four
hours will score significantly lower than people who are allowed to sleep for eight hours on a
cognitive skills test. He brings sixteen participants into his sleep lab and randomly assigns them
to one of two groups. In one group he has participants sleep for eight hours and in the other
group he has them sleep for four. The next morning he administers the SCAT (Sam's Cognitive
Ability Test) to all participants. (Scores on the SCAT range from 1-9 with high scores
representing better performance).