You are on page 1of 38

SRT605/

RELIABILITY AND VALIDITY


SRT666
RELIABILITY
 The degree to which a measure is consistent and
unchanged over a short period of time
 e.g. consistency from trial to trial within a day/ from day
to day

 Without reliability, data would be entirely different


when collected at different times
VALIDITY
 The degree to which interpretations of
scores/measures from an instrument lead to correct
conclusions
 The data must be measures of what they are
supposed to measure
RELIABILITY & VALIDITY
RELIABLE
Trial 1: Trial 2: Trial 3:
BUT
55.0 kg 55.2 kg 55.1 kg
NOT VALID

RELIABLE
Trial 1: Trial 2: Trial 3:
AND
55.0 kg 55.2 kg 55.1 kg
VALID

NOT RELIABLE
Trial 1: Trial 2: Trial 3:
AND
55.0 kg 268.1 kg 25.5 kg
NOT VALID
RELIABILITY ANALYSIS
RELIABILITY ANALYSIS
1. Test-retest reliability
 Measure the scores twice with the same instrument
 Comparing the scores from repeated testing of the
same participants with the same test

Monday, 178.2 cm Sunday, 178.3 cm


RELIABILITY ANALYSIS
2. Intra-rater reliability
 Assess how consistently the same rater can assign a
score or category to the same subjects
 e.g. re-scoring video footage

8 8 7
RELIABILITY ANALYSIS
3. Inter-rater reliability
 evaluate agreement of how consistently different
raters can assign the same score or category to the
subjects
 e.g. concurrent scoring during live observations. Or
scoring video footage

8 7 8 8
RELIABILITY ANALYSIS
4. Internal consistency reliability
 assess the consistency of results across items within
a test
 the extent to which a test/procedures assess the
same characteristic, skill or quality

Q1. “I like to ride bicycles”


Q2. “I've enjoyed riding bicycles in the past”
Q3. “I hate bicycles”
Q4. “I like to swim”
INTERNAL CONSISTENCY
 Cronbach alpha measures how consistent respondents are when
they respond to the items
 Can be used to check reliability of a “composite score”
 Identify the items that reduce the reliability of the test

Response

Strongly Not Agree


Not agree (2) Agree (3) Strongly Agree (4) Score
(1)
Q1 Item 1 2
Items
Q2 Item 2 2
Q3 Item 3 1
Q4 Item 4 2
Total score 17
EXAMPLE OF A QUESTIONNAIRE

 The DASS21 is a 21-item self report instrument designed to


measure depression, anxiety and tension/stress
 There are 7 items per sub-scale/domain (depression, anxiety and
stress). The total score for each sub-scale/domain is used for
interpretation
 According to Noorlila et al., 2018, the DASS-21 scale has good
internal consistency, with a Cronbach alpha coefficient reported of
0.91, 0.85, and 0.86 for depression, anxiety, and stress subscale
respectively
INTERNAL CONSISTENCY
 Lets assess the internal consistency of the “physical
functioning and mental health scale”
 In the questionnaire, there are 2 domains :
 physical functioning (PF)
 10 items in the PF domain
 mental health (MH)
 5 items in the MH domain
INTERNAL CONSISTENCY
 Dataset : https://goo.gl/s1Jo6T

 Analyze → Scale → Reliability Analysis


 Select all the items that be assessed to Items
 Click Statistics. In Descriptives for, tick Item, Scale and Scale if
item deleted. In Inter-Item, tick Correlations
INTERNAL CONSISTENCY
Reliability Statistics
Cronbach's Alpha

PH Cronbach's Alpha
.869
Based on
Standardized Items
.869
N of Items
10

 Check the Cronbach's Alpha value


 The value is 0.87, suggesting very good internal consistency
 Values above 0.7 are considered acceptable but values above 0.8
are preferable (good)
 A low value for alpha may mean that there aren’t enough items in the
test or poor interrelatedness between items
 If alpha is very high, it may be due to redundant items (multiple
questions asking the same thing)
INTERNAL CONSISTENCY

 The Corrected Item-Total Correlation values give an indication of the


degree to which each item correlates with the total score
 Low values (< 0.3) indicate that the item is measuring something
different from the scale as a whole
INTERNAL CONSISTENCY

 The Cronbach's Alpha if Item Deleted shows the impact of


removing each item from the scale

Reliability Statistics
Cronbach's Alpha
Based on

PH Cronbach's Alpha
.869
Standardized Items
.869
N of Items
10
INTERNAL CONSISTENCY
Reliability Statistics
Cronbach's Alpha

MH Cronbach's Alpha
.658
Based on
Standardized Items
.679
N of Items
5

 Check the Cronbach's Alpha value


 The value is 0.66 which is not favourable

 A low value for alpha could be due to:-


 the small no. of items in MH domain (total 5 items)

 poor interrelatedness between items


INTERNAL CONSISTENCY
Item-Total Statistics

Squared Cronbach's
Scale Mean if Scale Variance Corrected Item- Multiple Alpha if Item
Item Deleted if Item Deleted Total Correlation Correlation Deleted
Mental Health 1 17.50 12.014 .408 .329 .610
Mental Health 2 16.97 10.430 .541 .384 .544
Mental Health 3 17.38 10.146 .474 .266 .575
Mental Health 4 17.26 11.535 .519 .387 .568
Mental Health 5 17.57 12.155 .197 .116 .719

 Based on the Corrected Item-Total Correlation values, item MH5 has


a low value of 0.197 (i.e. < 0.3)
INTERNAL CONSISTENCY
Item-Total Statistics

Squared Cronbach's
Scale Mean if Scale Variance Corrected Item- Multiple Alpha if Item
Item Deleted if Item Deleted Total Correlation Correlation Deleted
Mental Health 1 17.50 12.014 .408 .329 .610
Mental Health 2 16.97 10.430 .541 .384 .544
Mental Health 3 17.38 10.146 .474 .266 .575
Mental Health 4 17.26 11.535 .519 .387 .568
Mental Health 5 17.57 12.155 .197 .116 .719

 If MH5 is removed, the new alpha value will increase from 0.66 to
0.72
 We may want to consider removing MH5, but we may not be able
to compare our results with other studies using the original scale
(with 5 items)
INTRACLASS CORRELATION
 A reliability index in test-retest, intra-rater, and inter-
rater reliability analyses
 The measured variable must be a continuous variable
 Interpretation of ICC values
 <0.5 : poor reliability
 0.5 - 0.75 : moderate reliability
 0.75 - 0.9 : good reliability
 > 0.9 : excellent reliability
 For categorical variable, Cohen's kappa (κ) is used
instead
INTRACLASS CORRELATION
(TEST-RETEST)
 Dataset : https://goo.gl/RGC62w

 Analyze → Scale → Reliability Analysis


 Select all the items that be assessed to Items
 Click Statistics. Tick Intraclass correlation coefficient.
Select Model: Two-Way Mixed, Type : Absolute
agreement
INTRACLASS CORRELATION

 The ICC value is 0.88 and the 95% confident interval is


0.79 - 0.94
 Therefore, the test-retest reliability of the investigated
method is “good” to “excellent”
VALIDITY ANALYSIS
VALIDITY
 The degree to which interpretations of
scores/measures from an instrument lead to correct
conclusions
 The data must be measures of what they are
supposed to measure
VALIDITY
 3 main types of validity:-
1. Construct validity
• Is the questionnaire really able to measure depression
(construct) or does it measure some other constructs such
as the respondent’s coping skills or level of happiness?
2. Content validity
• A satisfaction survey about iPhone 14 should include
questions about the product such as features, quality,
performance, colour, design, price, etc.
3. Criterion validity
• High correlation between the new test and the criterion
will indicate that the new test is actually measuring what
it is supposed to measure
FACTOR ANALYSIS
 to condense a large set of variables/ scale items down to a
smaller number of dimensions or factors
 Summarises the underlying patterns of correlation and
identifying groups of closely related items

Factor 1 Factor 2 Factor 1 Factor 2


Item 1 0.90 0.10 Item 1 0.92 0.10
Item 2 0.85 0.15 Item 2 0.88 0.15
Item 3 0.80 0.20 Item 3 0.84 0.90
Item 4 0.75 0.25 Item 4 0.79 0.86
Item 5 0.25 0.75 Item 5 0.75 0.75
Item 6 0.20 0.80 Item 6 0.69 0.80
Item 7 0.15 0.85 Item 7 0.65 0.85
Item 8 0.10 0.90 Item 8 0.10 0.90
FACTOR ANALYSIS
 Dataset : https://goo.gl/s1Jo6T

 Analyze → Dimension Reduction → Factor


 Select all the items that be assessed to Variables
FACTOR ANALYSIS
 Click Descriptives. Tick KMO & Bartlett’s test of
sphericity
 Click Extraction. Method : Principal component,
Analyze : Correlation matrix, Display: Scree plot,
Extract: Based on Eigenvalue
 Click Rotation. Method : Direct Oblimin
FACTOR ANALYSIS
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .776

Bartlett's Test of Sphericity Approx. Chi-Square 662.036

df 105

Sig. .000

To verify that the data set is suitable for factor analysis:-


 the Kaiser-Meyer-Olkin Measure of Sampling
Adequacy (KMO) value is ≥ 0.6
 the Bartlett's Test of Sphericity value is significant (i.e.
the Sig. value ≤ 0.05)
FACTOR ANALYSIS
Total Variance Explained  Using Kaiser's
Component Initial Eigenvalues Rotation Sums of Squared Loadings
criterion, we are
interested only in
% of Cumulative % of Cumulative

Total Variance % Total Variance %

1 4.837 32.249 32.249 3.227 21.512 21.512


components that
have an eigenvalue
2 2.188 14.584 46.833 3.204 21.361 42.873

3 1.707 11.382 58.214 2.087 13.912 56.785

4 1.151 7.674 65.888 1.365 9.103 65.888


of ≥ 1
5 .787 5.248 71.136

6 .688 4.588 75.724


 Only the first four
components have
7 .636 4.238 79.962

dimension0
8 .606 4.041 84.004

9 .495 3.297 87.301


eigenvalues ≥ 1
10 .474 3.161 90.462

11

12
.396

.371
2.643

2.472
93.105

95.576
 These four
13 .308 2.051 97.627 components explain
14 .198 1.322 98.950
a total of 65.89% of
the variance
15 .158 1.050 100.000

Extraction Method: Principal Component Analysis.


FACTOR ANALYSIS
 Sometimes, too
kink
many components
are extracted using
Kaiser's criterion the Kaiser's
criterion

 In the Screeplot, check for is a change (or elbow) in the


shape of the plot
 In this example, there is quite a clear break between the
second and third components
 Therefore, it is recommended to retain (extract) only two
components
FACTOR ANALYSIS
 In the component
matrix table, we
can see most of the
items load quite
v strongly (above 0.6)
on the first two
v
components
 This suggests that a
v two-factor solution
is likely to be more
appropriate
FACTOR ANALYSIS
 Using the default options in IBM SPSS Statistics, we
obtained a four-factor solution. It is now necessary to go
back and force a two-factor solution
 Analyze → Dimension Reduction → Factor
 Click Extraction. Extract: Fixed number of factors. Key
in the Factors to extract : 2
FACTOR ANALYSIS
 The Pattern Matrix shows the
factor loadings of each of the
items
 Factor loadings ≥ 0.4 are
considered acceptable
 Items with poor factor loadings
may need to be removed
PH-MH SCALE RELIABILITY & VALIDITY
Items Mean (SD) Factor 1 Factor 2 corrected item-total correlation Alpha
PF01 1.9 (0.70) 0.54 0.45
FP02 2.2 (0.74) 0.69 0.63
PF03 2.6 (0.63) 0.61 0.48
PF04 2.4 (0.67) 0.65 0.59
PF05 2.7 (0.61) 0.76 0.62
0.87
PF06 2.5 (0.65) 0.74 0.63
PF07 2.1 (0.75) 0.69 0.62
PF08 2.4 (0.76) 0.70 0.64 Low values (< 0.3)
indicate that the item is
PF09 2.3 (0.74) 0.72 0.65
measuring something
PF10 2.7 (0.63) 0.70 0.56 different from the scale
MH1 4.2 (1.06) 0.70 0.41 as a whole
MH2 4.7 (1.20) 0.80 0.54
MH3 4.3 (1.38) 0.64 0.47 0.66
low factor loading
MH4 4.4 (1.01) 0.76 0.52 Cronbach alpha value
(< 0.4)
MH5 4.1 (1.42) 0.32 0.20 does not reach acceptable
level
Decision should be made either to drop MH5 or revise the item

You might also like