You are on page 1of 74

UNIVERSITY OF SANTO TOMAS

Counseling and Career Center

Psychological Testing
Guidance and Counseling Board Examination Review
(Part I)

Maria Agnes A. Buquid-Bonifacio


July 28, 2018
Prayer Before Study by St. Thomas
Aquinas, OP
Creator of all things, true source of light and wisdom, origin of all
being, graciously let a ray of your light / penetrate the darkness
of my understanding.
Take from me the double darkness / in which I have been born,
an obscurity of sin and ignorance.
Give me a keen understanding, a retentive memory, and the
ability to grasp things correctly and fundamentally.
Grant me the talent of being exact in my explanations / and the
ability to express myself with thoroughness and charm.
Point out the beginning, direct the progress, and help in the
completion. I ask this through Jesus Christ our Lord. Amen.
References
• Psychological Testing (Kaplan, Cohen, Gregory)
• Alexa Abrenica’s Workbook
• Bajo (Reviewer)
• Mastering the National Counselor Examination and the
Counselor Comprehensive Examination (Erford, Hays,
Crockett)

• De Jesus
• Munarriz, Cervera
• DSM 5
• Behavioral Research Method
• Assessment of Children and Adolescents
• List of Psychological Tests (Internet)
Psychological Tes.ng
• Psychological tes.ng means
sta$s$cs
Purposes:
✏ Descrip(on
✏ Make Inferences
Psychological Testing
✏ a standardized measurement of a
sample of behavior
✏establishing norms
✏important test items that correspond
to what the test is to discover about the
test-taker
✏based on the uniformity of procedures
in administering and scoring the test
Standardized
✏ There is an established
reference point that a test scorer
can use to evaluate, judge,
measure against, and compare.

• How do we get this reference


point?
• Norms are established
Norms
✏ Relies on the number of test takers
who take a given test, to establish
what is normal in the group
✏ Then scorers can determine where
an individual falls within that group
✏ The larger the sample, the be;er!
back*
Test Items
✏ The ques+ons that a test-taker is
asked on any given test
✏ Must be relevant to what the test is
trying to measure
✏ Must have large sets in order to
establish a proper measurement
back*
Uniformity of Procedures in
Administering and Scoring

✏ Administrators present the test the


same way
✏ Test takers take the test the same
way
✏ Scorers score the test the same way
This helps with
✏ Validity
✏ Reliability
Scales of Measurement
Scales of Measurement
Scales of Measurement
Scales of Measurement
Frequency Distribution
• Frequency is how often something occurs.
• By counting frequencies we can make a
Frequency Distribution table.
• Frequency Distribution: values and their
frequency (how often each value occurs).
• Normal Distribution: Bell Curve
• Measures of Central Tendency (Mean, Median, Mode)
• Measures of Spread (Range, Percentiles, Standard Deviation)
Frequency Distribu/on
Describing Distributions
• Mean
• Standard Devia3on
• Z score
• T-score
• Quar3les and Deciles
Sta$s$cal Symbols
Mean
• The average score in a distribu3on
Exercise
Find the mean:

4, 2, 1, 1, 2, 1, 4, 1
Median and Mode
• Median

• Mode
Exercise
Find the median and mode:

4, 4, 2, 2, 1, 1, 1, 1
Mean = 2
Median = 1.5
Mode = 1
Range
• The Range is the difference between the
lowest and highest values.
Range
Standard Deviation

• Approximation of the average deviation


around the mean
– a number used to tell how measurements for a
group are spread out from the average (mean), or
expected value.
– A low standard deviation means that most of the
numbers are very close to the average.
– A high standard deviation means that the
numbers are spread out.
Standard Devia,on

• The square root of the average squared


devia,on around the mean
Standard Deviation
Consider a group having the following eight
numbers/scores:
2, 4, 4, 4, 5, 5, 7, 9

These eight numbers have the average (mean) of


5:
Standard Deviation
To calculate the population standard deviation,
first find the difference of each number in the
list from the mean. Then square the result of
each difference:
Standard Deviation
Next, find the average of these values (sum
divided by the number of numbers). Last, take
the square root:

The answer is the population standard deviation. The formula is only true if
the eight numbers we started with are the whole group. If they are only a
part of the group picked at random, then we should use 7 (which is n − 1)
instead of 8 (which is n) in the bottom (denominator) of the second-to-last
step. Then the answer is the sample standard deviation.
Z score
• Difference between a score and the mean, divided by
the standard devia6on
• The devia6on of score from the mean in standard
devia+on unit
Z score

Example 1:
X=6
Mean = 3
S=3
Frequency Distribution
Percentile Ranks
• Answers the question, “What percent of the scores fall below
a particular score?”
T score
• Exactly the same as standard scores (Z scores)
except that the mean is 50 rather than 0 and
the standard devia:on is 10 rather than 1.

T = 10Z + 50
Quar%les and Deciles
• Quartiles are points that divide the distribution
into equal fourths. The first quartile is the 25th
percentile, the second is the median or the 50th
percentile; and the third quartile is the 75th
percentile.
• Deciles are similar to quartiles except that they
use points that mark 10% rather than 25%
intervals. The top decile, or D9, is the point below
which 90% of the cases fall, D8 marks the 80%
percentile and so forth
Correla'on
• Scatter diagram
– Picture of the relationship between two variables
Correlation
Correlation
• Correlational analysis is designed primarily to
examine linear relationships between
variables
• A correlation coefficient is a mathematical
index that describes the direction and
magnitude of a relationship.
Correla'on
Correlation Coefficient

• a score that expresses both the strength


and direction of straight-line correlation

• ranges between -1.00 and +1.00


Correlation Coefficient
Correla'on Coefficient Description
-1.00 perfect negative correlation
-.60 strong negative correlation
-.30 moderate negative correlation
-.10 weak negative correlation
.00 no correlation
+.10 weak positive correlation
+.30 moderate positive correlation
+.60 strong positive correlation
+1.00 perfect positive correlation
Curvilinear Relationship
• a relationship between X and Y that begins
as positive becomes negative and vice
versa (e.g. relationship between anxiety
levels and English test scores)
Pearson s Correlation Coefficient (r)

• Pearson Product-Moment Correlation


Coefficient

• first introduced by Francis Galton

• named after Karl Pearson who developed


the correlational method to do agricultural
research
Pearson s Correlation Coefficient (r)

• requirements for the use of Pearson s r:


– For N greater than or equal to 30
– A straight-line relationship
– Interval data – two sets of scores or data, so
that scores may be assigned to the
respondents
– Random sampling (for applying a test of
significance)
Pearson s Correlation Coefficient (r)

• computational formula:
Coefficient of Determination (r2)

• once we have computed the r, it is useful to


compute the r2, which estimates the amount
of variability in scores on one variable that
can be explained by the other variable
• e.g. r= .40
r2 =.16
*meaning 16% of y can be explained by x
and the remaining 84% can be explained by
other factors
Formula of IQ

*by Alfred Binet


IQ
Exercises:
MA = 7 MA = 20
CA = 7 CA = 16
IQ = ? IQ = ?

MA = 34 MA= 30
CA = 40 CA = 15
IQ = ? IQ = ?
IQ
Exercises:
MA = 7 MA = 20
CA = 7 CA = 16
IQ = 100 IQ = 125
MA = 34 MA= 30
CA = 40 CA = 15
IQ = 85 IQ = 200
Reliability vs Validity
Reliability

Classical test score theory


X=T+E
Reliability
• Standard Error of Measurement (SEM)
✏ Standard devia8on of errors
✏ Es8mates how repeated measures of a
person on the same instrument tend to be
distributed around his or her “true” score.
✏ The true score is always an unknown because no
measure can be constructed that provides a perfect
reflec9on of the true score.
• Ex. Rubber-yards8ck as measure
Models of Reliability
• Time sampling: the Test-retest Method
– Administration of the same test on two well-
specified occasions and then find the correlation
between the scores from the two administration
• Item sampling: Parallel Forms Method
– Compares two equivalent forms of a test that
measure the same attribute
• Split-half Method
Split-half Method
• A test is given and divided into halves that are
scored separately
• Commonly use the odd-even system
• Correlation of the two halves is computed
• Uses the Spearman-Brown Formula
Other Methods for Estimating the
Internal Consistency of a Test
• KR20 Formula
– Does not split the test into two halves
– It considers all the individual item variances
• Coefficient Alpha
– Similar to KR20 formula
– For types of tests with no right or wrong answer
What to do about low reliability?
• Increase the number of items
• Factor and item analysis
• Correc9on for a:enua9on
What to do about low reliability?
• Increase the number of items
– The reliability of the test increases as the number
of items increases
– Spearman-brown prophesy formula can estimate
how many items will have to be added in order to
bring a test to an acceptable level of reliability.
What to do about low reliability?
• Factor and item analysis
– Tests are most reliable when they are
unidimensional. One factor should account for
more of the variance than any other factor.
– Items that do not load on this factor might best be
omitted.
– Uses discriminability analysis
• When the correlation between the performance on a
single item and the total test score is low, the item is
probably measuring something different from the other
items on the test
What to do about low reliability?
• Correction for attenuation
– To use the method, one needs to know only the
reliabilities of two tests and the correlation
between them
– E.g. happiness and scholastic achievement
Validity
• The agreement between a test score or
measure and the quality it is believed to
measure.
• “Does the test measure what it is
supposed to measure?”
• Types
– Content-related
– Criterion-related
– Construct-related
Content-Related Validity
• Considers the adequacy of representation of the
conceptual domain the test is designed to cover
– E.g., the score on your history test should represent your
comprehension of the history you are expected to know
• Construct underrepresentation
– The failure to capture important component of a construct
– Ex., measure of general mathematical knowledge (but only
algebra problems are included)
• Construct-irrelevant variance
– Occurs when scores are influenced by other factors
irrelevant to the construct
– Ex., test anxiety, reading comprehension
Criterion-related Evidence for Validity
• How well a test corresponds with a particular
criterion
• Predictive validity evidence
– Forecasting function of test
– Ex., entrance test and grades
• Concurrent validity evidence
– Measures and criterion measures are taken at the
same time
– Ex., learning disability test and school performance
Validity Coefficient
• The relationship between a test and
a criterion
• Validity coefficients in the range of
.30 - .40 are commonly considered
high
Construct-related Evidence for Validity
• Established through a series of ac8vi8es
in which a researcher simultaneously
defines some construct and develops
the instrumenta8on to measure it
• Types
– Convergent Evidence
– Discriminant evidence
Convergent Validity
• When a measure correlates well with
other tests believed to measure the
same construct
• Ex., those who score low on health
index are expected to visit the doctors
more often
Discriminant Validity
• A test should have a low correlations with
measures of unrelated constructs, or
evidence for what the test does not
measure
• Providing evidence that a test measures
something different from other tests,
providing evidence that it is a unique
construct
• Ex., health index should not correlate with
IQ
Item Difficulty
An item’s difficulty level is usually measured in
terms of the percentage of examinees who
answer the item correctly. This percentage is
referred to as the item difficulty index, or "p"

Op#mum Item Difficulty


2 alternaAves true and false = .75
3 alternaAves mulAple-choice = .67
4 alternaAves mulAple-choice = .63
5 alternaAves mulAple-choice = .60
Item Discrimina-on
Refers to the degree to which the items
differentiate among examinees in terms of the
characteristic being measured (e.g., between
high and low scorers).
Remember
If a test is unreliable, it cannot be valid.
For a test to be valid, it must reliable.
However, just because a test is reliable
does not mean it will be valid.
For questions, you may reach me thru:
mbbonifacio@ust.edu.ph

May our good


Lord God
be with you in
this journey!