You are on page 1of 18

STATE UNIVERSITY OF MEDAN

“ ASSOCIATIVE HYPOTHESIS TESTING “

ROUTINE TASK

Submitted as One of the Requirements for Passing the Course


Statistics

By Group 4 :

Bismi Amrina ( 4191141012 )


Michelle Leony Simangunsong (4192141002)
Ratu Nurul Aulia (4191441010)

Education Biology Education Study Program

DEPARTEMENT OF BIOLOGY
FACULTY OF MATHEMATICS AND NATURAL SCIENCE
MEDAN
2021

1
TABLE OF CONTENT

Cover ................................................................................................................................... 1
Table of Content ................................................................................................................. 2
CHAPTER I PRELIMINARY ........................................................................................ 3
1.1 Background ........................................................................................................ 3
1.2 Purpose ............................................................................................................... 3
1.3 Benefit …………………………………………………………………………. 4
CHAPTER II MATERIAL .............................................................................................. 5
2.1 Definition of Associative Hypothesis Testing ..................................................... 5
2.2 Associative Hypothesis Testing Techniques ………………............................... 6
2.2.1 Product Moment Correlation ................................................................... 7
2.2.2 Multiple Correlation ………………………………………………………..8
2.2.3 Parcial Correlation …................................................................................. 11
2.2.4 Contingent Correlation ............................................................................. 12

2.3 Regression Analysis…………………………………………………………….. 15

2.3.1 Simple Linier Regression ……………………………………………....... 16

REFERENCES ………………………………………………………………………….. 18

2
CHAPTER I
PRELIMINARY
1.1 Background
Hypotheses can be interpreted as statistical statements about population parameters. A
hypothesis is defined as a temporary answer to the formulation of a research problem, it can
be a statement about the relationship between two or more variables, a comparison
(comparison), or an independent variable (description).
Associative hypothesis testing is an assumption about the existence of a relationship
between variables in the population that will be tested through the relationship between
variables in the sample taken from the population. There are three forms of symmetrical
relationships, causal relationships (causal) and interactive relationships (mutual influence).
To find a relationship between two or more variables, it is done by calculating the correlation
between the variables to be searched for. Correlation is a number that shows the direction and
strength of the relationship between two or more variables. Direction is expressed in the form
of a positive and negative relationship, while the strength of the relationship is expressed in
the magnitude of the correlation coefficient.
The relationship between two or more variables is said to be a positive relationship, if
the value of one variable is increased, it will increase the other variable and vice versa if one
variable is lowered it will decrease the value of the other variable. The relationship between
two or more variables is said to be a negative relationship, if the value of one variable is
increased it will decrease the value of the other variable, and vice versa if the value of one
variable is lowered it will increase the value of the other variable.
There are various Correlation Statistics techniques that can be used to test associative
hypotheses. Which coefficient to use depends on the type of data to be analyzed. The
following shows the various correlation statistical techniques used to test the associative
hypothesis. Nonparametric statistics are used for nominal and ordinal data and for interval
and ratio data, parametric statistics are used.

1.2 Purpose
The objectives in preparing this paper are as follows :
1. Knowing the concept of associative hypothesis testing
2. Understand how to use the formula and under what conditions to use the associative
hypothesis test

3
3. Understand the correlation statistical techniques that can be used to test associative
hypotheses

1.3 Benefits
The benefits in preparing this paper are as follows :
1. Describe a research problem using associative hypothesis testing
2. Describe some of the variables that will be tested using associative hypothesis
testing
3. Helping students in preparing thesis in managing the data obtained

4
CHAPTER II
MATERIAL

2.1 Definition of Associative Hypothesis Testing


The associative hypothesis is an assumption of a relationship between variables in the
population, through the relationship data in the sample. For this reason, in the initial step of
proof, it is necessary to first calculate the correlation coefficient between variables in the
sample, then the coefficient found is tested for significance. So testing the associative
hypothesis is to test the correlation coefficient that is in the sample to be applied to the entire
population where the sample is taken.
There are three kinds of relationships between variables, namely symmetrical
relationships, causal relationships (causal), and interactive relationships (influencing each
other). To find a relationship between two or more variables is done by calculating the
correlation coefficient between these variables. The correlation coefficient is a number that
shows the direction and strength of the relationship between variables. The direction of the
relationship is indicated by a positive or negative sign, while the strength of the relationship
is indicated by the large number of correlation coefficients whose magnitude ranges from 0 to
± 1.
A positive relationship between two variables means that an increase in one variable
will cause an increase in the other variable. Meanwhile, a negative relationship means that
when one of the variables has an increase in value, the other variable decreases. As a positive
relationship between the amount of income and the amount of monthly expenditure, it means
that when income increases, the monthly expenditure also increases. Meanwhile, a negative
relationship occurs, for example, in the relationship between age and memory, which means
that the older a person is, the lower his memory will be. Like otherwise.
The correlation coefficient, which ranges from 0 to ± 1, shows the strength / weakness
of the relationship between the two variables. The correlation coefficient +1 indicates that
there is a perfect positive relationship between the two variables. Perfect here means that the
rise or fall of one variable can be explained by the other variables completely without the
slightest error. Meanwhile, the correlation coefficient is zero, meaning that there is absolutely
no relationship between the two variables. That is, the rise or fall of a variable that does not
affect the other variables at all. In social life, however, this correlation of zero and one is rare
(never will be).

5
In statistical analysis, the magnitude of the correlation coefficient can be described by
the distribution of the data points on the X-Y curve. The pictures showing the correlation
coefficient are as follows :

Y Y Y

X X X

Figure 1 Figure 2 Figure 3

Figure 1 shows the distribution of the relationship between variable X and variable Y
which does not show a certain pattern. That is, when variable X is low, variable Y can be low
or high. Likewise, when the variable X is high. This pattern shows that there is no
relationship between the two variables.
Figure 2 shows when variable X is low, variable Y is also low. When variable X is
high, variable Y is also high. This kind of relationship shows that between the two variables
there is a strong positive relationship.
Figure 3 shows when variable X is low, variable Y is high, and when variable X is
high, variable Y is low. This kind of relationship shows that there is a strong negative
relationship between the two variables.

2.2 Associative Hypothesis Testing Techniques For Various Data Scales


There are a variety of statistical correlation techniques that can be used to test
associative hypotheses. Which coefficient technique to use depends on the type of data being
analyzed. The following are various statistical correlation techniques used to test associative
hypotheses. Correlation test for interval and ratio data used parametrix statistics, while
correlation test for nominal and ordinal data used nonparametrix statistics.
Associative Hypothesis Testing Techniques For Various Data Scales :

Data Scale Statistics Testing Techniques


Interval/Ratio Pearson Product moment
Korelasi Ganda
Korelasi Parsial

6
Ordinal Korelasi Rank Spearman
Kendall Tau
Nominal Koefisien Kontingency

2.2.1 Product Moment Correlation


This correlation technique is used to find a relationship and prove the hypothesis of
the relationship between two variables in the form of intervals and ratios, which come from
one population. The simplest formula for calculating correlation is as follows :

rxy =

Where :
rxy = Correlation between variable x and variable y
x = (Xi – mean of X)
y = (Yi – mean of Y)

Sample case
Want to know, is it true that there is a relationship between income and expenses. For
this purpose, data were collected on 10 random respondents. The data obtained are as follows
:
X 800 900 700 600 700 800 900 600 500 500
Y 300 300 200 200 200 200 300 100 100 100

Ho : There is no relationship between income and expenses


Ha : There is a relationship between income and expenses

From the calculation of the correlation coefficient with the correlation formula at the
top, it is obtained :
Mean of x = 7;
Mean of y = 2;
ΣX2 = 20;
ΣY2 = 60;
ΣXY = 10;
obtained rxy = 0.9129

7
So, there is a positive correlation of 0.9129 between income and expenditure. This
means that the greater the income, the greater the expenses. The problem is whether the
correlation number is significant (can be generalized) to say that there is also a
POPULATION correlation. For this reason, it is necessary to compare r count with r table (r
product moment table) at a certain level of significance. By looking at the product moment r
table figures, for a 5% significance level, with N = 10, we get r table = 0.632. Mean count (=
0.9129) is greater than table r, which means we must rejects Ho and accepts Ha. It can be
concluded that there is a strong and significant relationship between income and expenditure.
Testing the significance of correlation, apart from comparing the correlation
coefficient (calculated r number) with the product moment coefficient r table, it can also be
done by comparing t count with t table.
In this way, t can be found using a formula :

t=
For the case, above, we get t count = 6.33
From the t table, with a significance level of 5%, two-party test, with dk = n-2 = 8, the
price of t table = 2.306.
Because t is greater than t table, then Ho is rejected and Ha is accepted, it means that
there is a strong and significant relationship between income and expenditure.

2.2.2 Multiple Correlation


Multiple correlation is a number that shows the direction and strength of the
relationship between two variables together or more with other variables. An understanding
of multiple correlation can be seen in the following figure :

X1 = Employee welfare
X1 = Leadership model
X1 = Supervision
Y = Work Effectiveness

8
From the picture above, it can be seen that the multiple correlation R is not the sum of
the simple correlations that exist in each variable (not r1 + r2 + r3). The name multiple
correlation ® is a joint relationship between X1, X2, X3 and Y.
The multiple correlation formula for two variables is as follows :

Where :
r y.x1.x2 = Correlation between variables X1 and x2 together with the
variable Y
r yx1 = Product moment correlation between X1 and Y
r yx2 = Product moment correlation between X2 and Y
r x2x1 = Product moment correlation between X1 and X2

Case in point :
To examine the problems of leadership models and office layout in relation to job
satisfaction, relevant data were collected. From these data, the simple correlation is
calculated, and obtained :
1. Correlation between the leadership model and job satisfaction = 0.45
2. Correlation between office layout and job satisfaction = 0.48
3. The correlation between the leadership model and office layout = 0.22

With the multiple correlation formula obtained : ry.x1.x2 = 0.5959

The results of the simple and multiple correlation calculations can be described as
follows :

9
From these calculations, it turns out that the magnitude of the multiple correlation R is
greater than the individual correlation ryx1 and ryx2.
The significance test of the multiple correlation coefficient can use the following F
test formula :

Fh =

Where :
R = Coefficient of multiple correlation
K = number of independent variables
N = number of sample members

Based on the figures that have been found, if n = 30, then the price of Fh can be
calculated using the formula above, and we get Fh = 7.43.
The calculated F value is then compared with the F table with dk numerator = k = 2;
and dk denominator = n - k - 1 = 10 - 2 - 1 = 7.With a significance level of 5%, the value of F
table is found to be 4.74.
By comparing the number of F table with F count, it turns out that F count is greater
than F table, it means that Ho is rejected and Ha is accepted. Thus, the multiple correlation
coefficient found is significant (applicable to the population from which the sample was
drawn).

2.2.3 Partial Correlation


This correlation is used to analyze the value of the researcher and intends to determine
the effect or relationship between the independent variable and the dependent variable, where
one of the other independent variables is kept / controlled. So, partial correlation is a number
that shows the direction and strength of the relationship between two or more variables after
one variable that is thought to affect the relationship is controlled to be made remains its
existence.
Example ;
1. correlation between IQ test scores and school achievement = 0.58
2. The correlation between study time and school achievement = 0.10
3. Correlation between IQ test scores and study time = -0.40

10
If there is a question, for students whose study time is the same (parsized), what is the
correlation between IQ and college achievement? The answer to this question can be found
using the following partial correlation formula :

The formula above can be read: The correlation between Y and X1, if the variable X2
is controlled or the correlation between Y and X1, if X2 is constant.
Meanwhile, if X1 is controlled, the formula is :

While the partial correlation test can be calculated with the following formula :

2.2.4 Contingency Correlation


This type of correlation is used to calculate the relationship between variables when
the data is in nominal form. This technique is closely related to Chi Square which is used to
test the comparative hypothesis of k independent samples. Therefore, the formula used
contains the Chi Square value. The formula is as follows :

The value of Chi Square (X2) is determined by the formula :

Example :
Research was conducted to find out whether there is a relationship between
occupational professions and the type of sport that is often practiced. Professions are grouped
into: Doctor, Lawyer, Lecturer, Businessman. (Dr, Pc, Ds, Bs). While the types of sports are
grouped into: Golf, Tennis, Badminton, Football (Gf, T, Bt, Sp). The number of respondents
used to collect data is as follows :

11
Dr = 58
Pc = 75
Ds = 68
Bs = 81
Total number = 282

The formulation of the hypothesis is :


Ho : There is no significant relationship between a person's profession and the
type of sport he likes
Ha : There is a strong and significant relationship between one's profession and
the type of sport one likes

Based on a sample of four professional groups that were randomly selected, data were
obtained such as the following data :

To calculate the expected f (fh), first calculate the percentage of each sample who
enjoys golf, tennis, badminton, and football sports.
From here, the percentage can be calculated :
a. Percentage of liking Golf = 80/282 = 0.284
b. Percentage of liking tennis = 80/282 = 0.284
c. Percentage of liking badminton = 70/282 = 0.248
d. Percentage of liking football = 52/282 = 0.184
Furthermore, each fh (expected frequency) of the group that enjoys each type of sport
can be counted :
1. Who loves Golf :
a. Fh Dokter : 0.284 x 58 = 16.472
b. Fh Pengacara : 0.284 x 75 = 21.300
c. Fh Dosen : 0.284 x 68 = 19.312
d. Fh Bisnisman : 0.284 x 81 = 23.004

12
2. Who likes tennis :
a. Fh Dokter : 0.284 x 58 = 16.472
b. Fh Pengacara : 0.284 x 75 = 21.300
c. Fh Dosen : 0.284 x 68 = 19.312
d. Fh Bisnisman : 0.284 x 81 = 23.004
3. Who likes badminton :
a. Fh Dokter : 0.248 x 58 = 14.384
b. Fh Pengacara : 0.248 x 75 = 18.600
c. Fh Dosen : 0.248 x 68 = 16.864
d. Fh Bisnisman : 0.248 x 81 = 20.088
4. Who likes football :
a. Fh Dokter : 0.184 x 58 = 10.672
b. Fh Pengacara : 0.184 x 75 = 13.800
c. Fh Dosen : 0.184 x 68 = 12.512
d. Fh Bisnisman : 0.184 x 81 = 14.902

Based on the results of these calculations, then entered into the table as follows :

Furthermore, the value of Chi Square (X2) can be calculated using the formula :

In this case, O (observation) = fo, and E (Expectation) = fh

13
Thus, the value of Chi Square (X2) is calculated = 29,881. Furthermore, to calculate
the contingency efficiency C, the price is entered into the formula :

So, the magnitude of the coefficient between the type of profession and the pleasure
of sports = 0.31. To test the significance of the C coefficient, it can be done by testing the
calculated Chi Square (X2) value found with the Chi Square (X2) table, at a certain
significance level and dk. Price dk = (k - 1) (r - 1); where K = number of samples = 4; r =
number of exercise categories. So dk = (4 - 1) (4 - 1) = 9. With dk = 9 and at the significance
level of 0.05, the value of Chi Square (X2) table = 15.51. The test provision is that if the
calculated Chi Square (X2) value is greater than the Chi Square (X2) table, then the
relationship is significant. In our case above, it turns out that the calculated Chi Square (X2)
value is greater than the Chi Square (X2) table. (29,881> 15.51). Thus, it can be concluded

14
that Ho is rejected and Ha is accepted. So, the type of work profession actually has a
significant relationship with the type of sport you like, which is 0.31. The data available on
the sample and the correlation rate reflect the state of the population in which the sample is
drawn.

2.3 Regression Analysis


Both correlation and regression have a very close relationship. Every regression must
have a correlation, although not every correlation is always checked for regression.
Correlation that is not followed up with regression is the correlation between two variables
that do not have a causal relationship (cause-effect). This regression analysis is carried out
when the relationship between two variables is in the form of a causal or functional
relationship. To determine whether the two variables have a causal relationship or not, it must
be based on theories or concepts about the two variables.
The relationship between the leadership model and job satisfaction is a causal
relationship, because in theory the leadership model will lead to the satisfaction of workers in
the area of their leadership. However, the relationship between the chirping of the emprit
birds and the incoming guests is not a casual relationship, even though (it is said that) this
condition has something to do with it.
We use regression analysis when we want to know how the dependent variable
(criterion variable) can be predicted by the independent variable (predictor variable). The
result of regression analysis is a projection (prediction) and planning, how the rise or fall of a
variable can be controlled / controlled by increasing or decreasing other variables. For
example, if it is known that the number of items sold has a relationship and is influenced by
the amount of advertising costs spent, then a marketer can plan how much advertising costs to
spend if it wants to double the number of sales. So and so on.

2.3.1 Simple Linier Regression


Simple regression is based on the functional or causal relationship of one independent
variable with one independent variable. The general equation for simple linear regression is :
Y '= a + bX
Where :

15
Y' : Subject in the dependent variable predicted a: Price Y if X = 0 (constant
price)
b : The number of direction or regression coefficient, which shows the rate of
increase or decrease in the dependent variable based on the independent
variable.
X : Subject to the independent variable that has a certain value.
Technically, the price of b is the tangent of (the ratio) between the dependent variable
line lengths after the regression equation is found.

Y = 2 +
0.33X

a=2
b = (3-2) / (3-0)
= 1/3
= 0.33

The formula for getting the prices a and b is as follows:

a= Y - bX

Where:
r : product moment correlation coefficient between variable X and variable Y
Sy : standard deviation of variable Y
Sx : standard deviation of variable X
So the price of b is a function of the correlation coefficient. If the correlation
coefficient is high, then the price of b is also large, on the other hand, if the correlation
coefficient is low, the price of b is also low (small). In addition, if the correlation coefficient
is negative, the price is also negative.
Apart from using the formula above, prices a and b can also be found using the
following formula;

16
 The Example of Simple Linier Regression
These data is result of observing at Quality value of service (X) and average
value of sell (Y) Balonpas :
Nomor X Y Nomor X Y
1 54 167 18 45 160
2 50 155 19 47 155
3 53 148 20 53 159
4 45 146 21 49 159
5 48 170 22 56 172
6 63 173 23 57 168
7 46 149 24 50 159
8 56 166 25 49 150
9 52 170 26 58 165
10 56 174 27 48 159
11 47 156 28 52 162
12 56 158 29 56 168
13 55 150 30 54 166
14 52 160 31 59 177
15 50 157 32 47 149
16 60 177 33 48 155
17 55 166 34 56 160

To count the regression equation, so we need table device as follows :


Nomor X Y XY X^2 Y^2
1 54 167 9.018 2.916 27.889
2 50 155 7.750 2.500 24.025
3 53 148 7.844 2.809 21.904
4 45 146 6.570 2.025 21.316
5 48 170 8.160 2.304 28.900
6 63 173 10.899 3.969 29.929
7 46 149 6.854 2.116 22.201
8 56 166 9.296 3.136 27.556
9 52 170 8.840 2.704 28.900
10 56 174 9.744 3.136 30.276
11 47 156 7.332 2.209 24.336
12 56 158 8.848 3.136 24.964
13 55 150 8.250 3.025 22.500
14 52 160 8.320 2.704 25.600
15 50 157 7.850 2.500 24.649
16 60 177 10.620 3.600 31.329
17 55 166 9.130 3.025 27.556
18 45 160 7.200 2.025 25.600
19 47 155 7.285 2.209 24.025

17
20 53 159 8.427 2.809 25.281
21 49 159 7.791 2.401 25.281
22 56 172 9.632 3.136 29.584
23 57 168 9.576 3.249 28.224
24 50 159 7.950 2.500 25.281
25 49 150 7.350 2.401 22.500
26 58 165 9.570 3.364 27.225
27 48 159 7.632 2.304 25.281
28 52 162 8.424 2.704 26.244
29 56 168 9.408 3.136 28.224
30 54 166 8.964 2.916 27.556
31 59 177 10.443 3.481 31.329
32 47 149 7.003 2.209 22.201
33 48 155 7.440 2.304 24.025
34 56 160 8.960 3.136 25.600

 1.782 5.485 288.380 94.098 887.291

We use the formula to count price of a and b, so we will get :

(5.485) (94.098) – (1.782) (288.380)


a= = 93.85
(34).(94.098) – (1.782) 2

(34).(288.380) - (1.782).(5.485)
b= = 1.29
(34).(94.098) – (1.782) 2

By finding the prices of a and b, the regression equation can be determined, namely:

Y = 93.85 + 1.29 X

This regression equation can then be used to project how the individuals in the dependent
variable will occur if the individuals in the independent variable are defined. For example, if
the service quality score is set = 64, then the average sales value can be predicted as:

Y = 93.85 + (1.29 x 64) = 176.41

From the regression equation above, it can be interpreted that if the value of service quality
increases by one unit, then the average sales value per month will increase by 1.29 units. It
can also be said that each service quality value increases by 10, then the average sales value
will increase by 12.9.

18

You might also like