You are on page 1of 8

PRINCIPLES/INTRO o Criterion-referenced test (CRT)

relate to the content of the test (ex.


PSCYHOLOGICAL TEST Set of items designed to measure characteristics of human beings that pertains to behavior. Theres a certain criterion to be met)
PROJECTIVE TESTS: ambiguous test stimulus, unclear
Psychological assessment gathering use tools: tests, interviews, case studies, observations responses
PSYCHOLOGICAL Collaborative- assessor &assesse may work as partners Wishes, Intrapsychic conflict, Desires,
APTITUDE TESTS: predicting, acquiring skills or
ASSESSMENT Therapeutic Unconscious motives
competencies
Dynamic Subjectivity on test- interpretation/clinical
Ex. Differential Aptitude Test
SCALES Relate raw scores to some theoretical/empirical distribution (IRON) judgement
SCORING *cut score* Self administered/individual tests
Unlimited responses
Results are integrated into a single score interpretation
SCALES OF MEASUREMENT (IRON)
NONPARAMETRIC: Abnormal Distribution of Scores
PARAMETRIC: Normal distribution of scores (Pearson r) ASSESSMENT TECHNIQUES (D I T O)
(Spearman, Chisquare in Nominal)
DOCUMENTS INTERVIEWS TESTS OBSERVATION
Mag. Eq.Int Abs. 0 Mag. Eq.Int Abs. 0
-records, protocols, -interview responses, -initial assessment > -behavioral observation
INTERVAL: temp., time, iq / / - ORDINAL: ranking / - - collateral reports screening verification -observation checklist
RATIO: weight, height / / / NOMINAL: no ranking - - - -Structured -written, verbal, visual
-Unstructured

DISTRIBUTION
How frequent each value was obtained BIAS SOURCES
Abnormal distribution- skewed
RESPONSE SET A rater marks at the same place on the rating scale regardless of examinees performance
Normal distribution- falls on central tendency (mean, median, mode)
FREQUIENCY DISTN Mean- average score LENIENCY ERROR Give high positive ratings despite differences among examinees performance
SD- approximation of the average deviation around the mean; square root of SEVERITY ERROR Give low negative ratings despite differences among examinees performance
variance CENTRAL TENDENCY
Give middle range rating (Likert scale)
Z scores- difference between a score and the mean, divided by SD ERROR
Falls at high end of distribution. PROXIMITY ERROR Differing skills are rated similarly when sequentially ordered as in a process
POSITIVE SKEW
*means test is too difficult HALO ERROR Performance rating is influenced by unrelated impressions
Falls at lower ends of distribution. LOGICAL ERROR Poorly worded skill specification in an unintended manner
NEGATIVE SKEW
*means test is too easy
LACK OF INTEREST
Percentage of people whose scores on a test falls below a particular raw score Rater is not really interested in the process
PERCENTILE RANK ERROR
Percentile: specific scores within a distribution
IDIOSYNCRATIC ERROR Unexpected and unpredictable ratings given for a number of reasons
PSYCHOLOGICAL TESTS
ABILITY TESTS PERSONALITY TESTS
WHICH TYPE OF RELIABILITY IS APPROPRIATE?
INTELLIGENCE TESTS: general potential to solve problems
PERSONALITY TESTS: traits/domains/factors. Usually no Test has two forms Parallel Forms Reliability
Verbal intelligence
right or wrong answers Test designed to be administered to an individual more than
Non-verbal intelligence Test-retest Reliability
Ex. MBTI once
Ex. WAIS, Stanford Binet Int. Scale, Culture Fare Intelligence
Test Tests with factorial purity Cronbach Alpha
OBJECTIVE TESTS: structured. Yes or No or True or false Test with items carefully ordered according to
Split-half reliability
Standardized: test administration, scoring, difficulty
interpreting scores Tests involves some degree of subjective scoring Inter-rater Reliability
ACHIEVEMENT TESTS: previous learnings.
Limited number of responses Tests involves dichotomous items KR20
Measures the extent of ones knowledge; various
Group tests
academic subject Dynamic Characteristics ever changing
NORMS: where we base the scores. Internal Consistency
Ex. Stanford Achievement Test in reading characteristics that change through time or situation.
o Norm-referenced test (NRT) test
takers perform better or worse (ex. Static Characteristics Characteristics that would not
Test-retest and Parallel-Form Reliabilty
Age norms) vary across time
VALIDITY
-measures what it purports to measure
CONTENT CONSTRUCT
-An informed scientific idea developed or hypothesized to describe or explain a behavior; something built by
-essence of what youre measuring consist of mental synthesis. Unobservable, presupposed traits
topics and processes -Required when no criterion or universe of content is accepted as entirely adequate to define the quality
-often made by expert judgement being measured.
-GENERALIZABILITY examiner will generalize CRITERION RELATED -A test has a good construct validity if there is an existing psychological theory which can support what the
from the sample of items to the degree of how well a test corresponds w/ a particular criterion test items are measuring.
content mastery possessed by individual -criterion standard ->Characteristics: Relevant, valid and reliable, Uncontaminated -both logical analysis and empirical data.
examinee -Criterion contamination- criterion based on predictor measures -general than specific and provide frame of reference
EDUCATIONAL CONTENT VALID TEST - follows -both valid and reliable EVIDENCES:
TOS -performance in the first measure should be highly correlated w/ performance on the second 1. Test is homogenous, measuring a single construct.
EMPLOYMENT CONTENT VALID TEST- 2. Test score increases or decreases as a function of age, passage of time, or experimental manipulation.
appropriate job related skills. Reflect the job 3. Pretest, posttest differences
specification of the test 4. Test scores differ from groups.
CLINICAL CONTENT VALID TEST- symptoms of 5. Test scores correlate with scores on other test in accordance to what is predicted.
disorders are covered. Reflects the diagnostic UNIDIMENSIONAL- one construct MULTI-DIMENSIONAL- several constructs
criteria for a test CONVERGENT DIVERGENT
CONSTRUCT UNDERREPRESENTATION- failure to
PREDICTIVE
- The test is correlated to another measure -Also called as divergent/discriminant validity
capture important components of a construct
CONCURRENT -correlate what occurs in the future
-correlate well; measure the same construct as to -A validity coefficient sharing little or no
-correlate what is occurring now - Test scores may be obtained at one time
CONSTRUCT-IRRELEVANCE VARIANCE- when other test relationship between the newly created test and
scores are influence by factors irrelevant to the -Both the test scores and the criterion measures and the criterion measure may be obtained
Ex. Depression test and Negative Affect Scale an existing test.
are obtained at present in the future after an intervening event.
construct - administered to the same subjects as the measure is -Social Desirability test and Marital Satisfaction
-valid, reliable and considered a standard -performance is predicted based on one or
CONTENT VALIDITY RATIO (CVR)- by Lawshe, being validated. Two measures are intended to test.
proposed a structured & systematic way of -often confused with a construct validity strategy more known measured variables
measure the same construct, but are NOT -test measuring something different from the
-ex. MAT, GRE< GMAT
establishing content validity of a test administered in the same fashion. other test measure

RELIABILITY
-consistency of a test
-indicate how stable a test score.
-should produce similar results consistently if it measures the same thing
-A TEST CAN BE RELIABLE BUT WITHOUT BEING VALID
TEST-RETEST RELIABILITY PARALLEL-FORM RELIABILITY INTERNAL CONSISTENCY INTER-RATER RELIABILITY
-Stability (Will the scores be stable over time? -r -How well does each item measure the content/construct -Kappa Statistics
-Pearson r -Equivalent (Are the two forms of the test equivalent?) under consideration? -different raters, using common rating form, measure the
-gives the same test to the same group of test takers on 2 -different forms of the same test are administered to the same -Used when tests are administered once. object of interest consistently.
different times group at different times ->high reliability coefficient -There is consistency among items within the test. -Are the raters consistent in their ratings?
-carryover effect: too shortwhen the first testing session -Tests should contain the same number of items and the items If all items on a test measure the same construct, then it has a *Cohens Kappa used to know the agreement among 2 raters
influences the results of the second session and this can affect should be expressed in the same form and should cover the good internal consistency. *Fleiss Kappa used to know the agreement among 3 or more
the test-retest reliability of a psychological measure same type of content. The range and level of difficulty of the *SPLIT-HALF RELIABILITY- spearman brown prophecy formula raters.
-practice effect: a type of carryover effect wherein the scores items should also be equal. Instructions, time limits, illustrative -splitting the items on a questionnaire or test in half,
on the second test administration are higher than they were examples, format and all other aspects of the test must computing a separate score for each half, and then calculating
on the first. likewise be checked for equivalence. the degree of consistency between the two scores for a group
-used only in measuring traits/characteristics that do not -PROBLEM: difficulty of developing another form of participants. (Odd or even)
change over time *CRONBACH ALPHA- Used when two halves of the test have
-error variance: corresponds to the random fluctuations of unequal variances.
performance from one test session to the other. -Provides the lowest estimate of reliability.
-Average of all split halves. Ex. Likert scale items
*KR20- for binary; dichotomous. Tests with right or wrong
format
I.DESCRIPTION OF THE GROUP II. CORRELATE VARIABLES III. COMPARISON OF GROUPS IV. PREDICTING VARIABLES
A. Pair of Interval or continuous Pearson r A. Random Sampling A. One is to one Linear Regression
B. Pair of Ordinal Spearman Rho a. 2 separate groups w/ individual means T-test independent B. More than one is to one (X1+ X2+ X3=Y) Multiple Regression
C. Pair of Dichotomous KR20 measures C. Sets of predictors ; Significant or not: Hierarchical Regression
a. Both alternatives b. 1 group, 2 scores T-test dependent M1 Xq = Y
D. One continuous and one dichotomous c. 3 or more groups ANOVA Repeated M2 X q + X2 = Y
a. True Point Biserial d. 1 group, 3-more scores ANOVA 1way M3 Xq + X2+ X3 = Y
A. Central Tendency
b. Artificial Biserial e. 2 or more groups per group ANOVA Split Plot or Mixed design D. Sets of predictors ; All significant Stepwise Regression
B. Variability
E. 3 or more raters Agreement f. 2 IVs; 1 DV ANOVA Two way M1 Xq* = Y
C. Standard scores
a. Kendals Coefficient Concordance i. 4 groups- 2x2 design M2 Xq* + X2* = Y
D. Frequencies
B. Non Random Sampling M3 Xq* + X2*+ X3* = Y
a. 2 separate groups Mann Whitney U E. Outcome is Nominal Logistic Regression
b. 1 group; 2 Ordinals Wilcoxon Signed Rank Test
c. 3 or more groups Kruskal Wallis & H-test
d. 3 or more ranks Freidman Test
e. 1 group into categories/frequencies Chi-square

ASSESSMENT TESTING
- Broad array of evaluative process - Instruments that yield scored based on collected data (a subset of assessment)
- Objective- answers, solves problems, decides - Obtain some measure (numerical in nature with regard to ability/attribute)
- Process: Individualized process - Process: Individualized or grouped
- Role of evaluator: Key in the choice of tests - Role of evaluator: May be substituted
- Skills of evaluator: Educated selection of tools, and skilled - Skills: Technician-like skills
- Outcome: Logical problem solving approach - Outcome: Yield test scores/series of test scores
Technical Quality to a tests psychometric soundness TESTS
ITEM - Suggests a sample of behavior of an individual. 1. Content the subject matter of the test
3 FORMS OF ASSESSMENT (T C D) SCALE - Process by which a response can be scored. 2. Format pertains to the form, plan,
1. THERAPEUTIC PSYCHOLOGICAL ASSESSMENT the patient gains insight about the disorder & later develop TYPES OF PSYCHOLOGICAL TESTS structure, arrangement, and layout of
psychological wellness 1. NUMBER OF TEST TAKERS test items
2. COLLABORATIVE PSYCHOLOGICAL ASSESSMENT the patient helps the clinician to uncover the disorder a. Individual 3. Administration Procedures
3. DYNAMIC PSYCHOLOGICAL ASSESSMENT follow process (ABA Design) b. Group administered on a one-to-one basis or by
a. Evaluation 2. VARIABLE BEING MEASURED group
b. Therapy/intervention a. ABILITY 4. Scoring and Interpretation
c. Evaluation i. ACHIEVEMENT a. Score code or summary
ASSESSMENT TOOLS (O P I) ii. APTITUDE/PROGNOSTIC statement that reflects an
1. OBSERVATION monitoring the actions of others or oneself by visual or electronic means while recording iii. INTELLIGENCE evaluation of performance on a
quantitative and/or qualitative information regarding those action b. PERSONALITY test
a. Natural observation - observing behaviors in setting in which behavior would typically be expected to i. OBJECTIVE/STRUCTURED b. Scoring process of assigning such
occur ii. PROJECTIVE/UNSTRUCTURED evaluative codes or statements to
b. Role play test - a tool of assessment wherein examinees are directed to act as if they were in a particular iii. INTERESTS performance on tests
situation,
2. PYSCHOLOGICAL TESTING A set of items used for testing/ measuring/ determining individual difference. The process MAXIMUM PERFORMANCE TESTS
of measuring psychology related variables by means of a device. SPEED TEST test is homogeneous, means that it is easy. Short CHARACTERISTS OF PSYCHOLOGICAL TESTING
3. INTERVIEW gathering information through direct communication. Differ from their purpose, length, and nature. time. 1. Objective free from the
a. Panel interview multiple interviewers POWER TEST few items but more complex subjective perception
i. Advantage: minimizes the idiosyncratic biases of a lone interviewer REFERENCE SOURCES sources for authoritative info about published test 2. Standardized Uniformity exists
ii. Disadvantage: costly; the use of multiple interviewers may not be even justified Test Catalogues brief description of test 3. Reliable there is consistency in
iii. Portfolio: sample of ones ability and accomplishment. Test manuals detailed information of a test test results
iv. Case history data: refers to records, transcripts, and other accounts in written, pictorial. CASE REFERENCE VOLUMES one-stop shopping 4. Valid test measures what it
STUDY - a report or illustrative account concerning a person or an event that was compiled on Journal articles purports to measure
5. Good predictor validity test
the basis of case history data Online data bases
results suggest future behavior.
ETHICS IN PSYCHOLOGICAL TESTING
TEST SECURITY
ETHICAL CODE
The codes remind professionals that it is their responsibility to make reasonable efforts to ensure the integrity of test content
Professional guidelines for appropriate behavior
and the security of the test itself. Professionals should not duplicate tests or change test materials without the permission of
o American Counseling Association (2005)
the publisher.
o American Psychological Association (2003)
o Psychological Association of the Philippines (2009)
CHOOSING APPROPRIATE ASSESSMENT INSTRUMENTS TEST SCORING & INTERPRETATION
Ethical codes stress the importance of professionals choosing assessment instruments that show test worthiness, which has to The codes highlight the fact that when scoring test and interpreting their results, professionals should reflect on how test
do with the reliability, validity, cross-cultural fairness, and practicality of a test. worthiness (reliability, validity, cross-cultural fairness, and practicality) might affect the results.
Professional must take appropriate actions when issues of test worthiness arise during an assessment so that the results of
the assessment are not misconstrued.
MORAL ISSUES
COMPETENCE IN USING TESTS Human Rights DIVIDED LOYALTIES - Psychologist are torn whether their client is the institution or the person.
Requires adequate knowledge and training in administering an instrument. Labeling Institutions should be informed of what they needed or answer the referral question only.
Competence to use tests accurately is another aspect that is stressed in the codes. The codes declare that professionals Invasion of Privacy
should have adequate knowledge about testing and familiarity with many test they may use. Divided Loyalties
Responsibilities of Test Users, Test Publishers, and Test Constructors
THREE-TIER SYSTEM HUMAN RIGHTS
LEVEL A - those that can be administered, scored, and interpreted by responsible nonpsychologist who have carefully read the Right to Informed Consent
manual and are familiar with the overall purpose of testing. Educational achievement tests fall into this category. Ex. Right to know their test results and basis of any decisions that affect their lives
Achievement tests, Specialized Aptitude Test Right to know who will have access to test data and right to confidentiality of test results.
LEVEL B - requires technical knowledge of test construction and use and appropriate advanced coursework in psychology and INFORMED CONSENT
related courses (Statistics, Individual Differences, and Counseling). Ex. Group Intelligence Test, Personality Test Permission given by the client after assessment process in explained.
LEVEL C - requires an advanced degree in Psychology or Licensure as a psychologist and advanced training/supervised Informed consent involves the right of clients to obtain information about the nature and purpose of all aspects of the
experience in the particular test. Ex. Projective Test, Individual Intelligence Test, Diagnostic Test assessment process and for clients to give their permission to be assessed.
CROSSCULTURAL SENSITIVITY NON-REQUIREMENT OF INFORMED CONSENT
Ethical guideline to protect clients from discrimination and bias in testing. Mandated by the law.
The code stresses the importance of professionals being aware of and attending to the effects of age, color, cultural identity, Testing as routine educational, institutional, or organizational activity.
disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status on administration and test interpretation. Evaluation of decisional capacity.
LABELING CONFIDENTIALITY - Ethical guideline to protect client information. Whether
PROPER DIAGNOSIS
Effects of Labeling
Choose appropriate assessment techniques for accurate diagnosis. conducting a broad assessment of a client or giving one test, keeping information
o Results to Stigmatization confidential is a critical part of the assessment process and follows similar guidelines
The codes emphasize the important role that professionals play when deciding which assessment techniques to use in
o Affects ones access to help to how one would keep information confidential in a therapeutic relationship.
forming diagnosis for mental disorder and the ramification of making such diagnosis.
o Make a person passive
RELEASE OF TEST DATA INVASION OF PRIVACY
Test data are protected-client release required The codes generally acknowledge that, to some degree, all test invade ones privacy and highlight the importance of clients
The codes assert that data should only be released to others if the clients have given their consent. understanding how their privacy and highlight the importance of clients understanding how their privacy might be violated
The release of such data is generally only given to individuals who can adequately interpret the test data and to those who upon.
will not misuse the information.
WHEN CAN REVEAL CONFIDENTIAL INFORMATION
1. If a client is in danger of harming himself or herself or someone else;
2. If a child is a minor and the law states that parents have a right to information about their child;
TEST ADMINISTRATION
3. If a client asks you to break confidentiality (for example, your testimony is needed in court)
The codes reinforce the notion that tests should be administered in a manner that is in accord with the way that they were
4. If you are bound by the law to break confidentiality (for example you are hired by the courts to assess an individuals capacity
established and standardized.
to stand trial);
Alterations to this process should be noted and interpretations of test data adjusted in the testing conditions were not ideal.
5. To reveal information about your client to your supervisor in order to benefit the client;
6. When you have a written agreement from your client to reveal information to specified sources (for example, the court has
asked you to send a test report to them).
MORAL MODEL OF DECISION MAKING RESPONSIBILITIES OF TEST USERS, PUBLISHERS, AND CONSTRUCTORS
AUTONOMY - Respecting the clients right of self-determination and freedom of choice. Use assessment instrument to samples similar of the standardization group (reliability, validity, established norms)
NON-MALEFICENCE - Ensuring the professionals do no harm Test users must possess knowledge of test construction and supporting researches of any test they administer.
BENEFICENCE - Promoting the well being of others and of society Test developers should provide psychometric properties of the test specified scoring and administration and clear description
JUSTICE - equal and fair treatment to all people and being non discriminatory. of the normative sample.
FIDELITY - Being loyal and faithful to your commitments in the helping relationship.
VERACITY - Dealing honestly with the client.
0
NORMS AND STATISTICS
USE OR TWO TYPES OF STATISTICS MEASURE OF CENTRAL TENDENCY - Statistics that indicated the average or midmost score between the extreme scores in distribution.
1. DESCRIPTIVE used for making interpretation of test results. Provide concise description of quantitative information MEAN the most appropriate central tendency for interval and ratio when distribution is normal.
2. INFERENTIAL provide conclusions regarding a population based on the observation on a sample MEDIAN middle score of the population
SCALES OF MEASUREMENT MODE the most frequently occurring score in a distribution
1. NOMINAL naming; labeling; one category does not suggest that the other is higher or lower. Ex. Gender; religion MEASUREMENT OF VARIABILITY
2. ORDINAL observations can be ranked into order but the degree of difference is unobtainable. Ex. Position in the company Indicates how scattered the score are distribution; how far one score is from the other. Measures the dispersion of the scores.
3. RATIO there is magnitude, equal intervals, and true zero Range equal to the difference of HS to LS
4. INTERVAL there is magnitude and equal interval; no true zero INTERQUARTILE AND SEMI-INTERQUARTILE RANGE
*magnitude - moreness; we suggests that one is more than the other Quartile points that divide the distribution into 4 equal parts.
*equal interval - the difference between two points at any place has the same meaning as the difference between two other points on other Interquartilerange difference between Q3 and Q1; represents the middle 50% of the distribution.
places. Semi-interquartilerange -(Q3 Q1)/2
*absolute zero - zero suggest absence of the variable being measured
*most psychological data are ordinal by nature but are treated as interval.
*IQ are initially for classification and not for measurement (cited by Binet)
FREQUENCY DISTRIBUTION - Displays scores on a variable or a measure to reflect how frequent each value was obtained. STANDARD DEVIATION - Approximation of the average deviation around the mean. Gives detail of how much above or below a score to the mean.
*GRAPH - a diagram or chart illustrating data NORMAL DISTRIBUTION majority of the test takers are bulked at the middle of the distribution, very few test takers are at the
Histogram - graphs with vertical lines at the true limits of each test score; connected bars; used for continuous data extremes
Bar graph used in describing frequencies; disconnected bars POSITIVELY SKEWED more test takers got a low score. Mean>median>mode
Frequency Polygon points are plotted at the class mark of each of the intervals; Continuous lines NEGATIVELY SKEWED more test takers got a high score. Mode>median>mean
KURTOSIS - The steepness of a Distribution STANDARD SCORES - A raw score that has been converted from one scale to another scale. Provide a context of comparing scores on different tests
PLATYKURTIC flat; the difference of the number of test takers who got high and low score is not far from the number of test takers by converting scores from the two tests into z-score
who got a score in equivalent to the mean Z SCORE Mean of 0; SD of 1. Zero plus or minus one scale. When determined, can be used to translate one scale to another.
LEPTOKURTIC Peaked; the difference of the number of test takers who got high and low score is far from the number of test takers T-SCORE Mean = 50; SD = 10. Created by McCall in honor of his professor Thorndike
who got a score in equivalent to the mean. STANINE Mean = 5; SD = 2. Used by US Airforce Assessment. Takes whole numbers 1 9; no decimals
MESOKURTIC Middle; the distribution is deemed normal. DEVIATION IQ Mean = 100; SD = 15. Used for interpreting IQ
DECILE - Points where the distribution is equally divided into 10 parts. D1 D9 STEN Standard ten. Mean = 5.5; SD = 2
GRE/SAT Mean = 500; SD = 100. Used for admission for graduate school and college
LINEAR TRANSFORMATION - Derived formula of the Z-score to transform one score from a scale to another score. NS = SD(Z)+M NORMS - Performance by defined groups on a particular test. Transformation of raw scores in making meaningful interpretations of scores on a test
PERCENTILE RANK NORMING - process of creating norms
Tells the relative position of a test taker in a group of 100. NORMATIVE SAMPLES - group of people whose performance on a particular test is analyzed and referred
Suggests how many samples fall below a specified score. RACE NORMING norming based on race/ culture
For example: if person has a score equivalent to percentile 50, it suggests that 50 percent of the test takers fall below that specific USER NORMS - norms provided by the test manuals
score. NORMAN - the person who constructs a norm
CORRELATION - Statistical tools for testing the relationship between variables. CRITERION-REFERENCE - interpretation of test is based on a certain standards.
COVARIANCE How much two scores vary together NORM-REFERENCE - Score is interpreted based on the performance of a standardized group.
CORRELATIONAL COEFFICIENT mathematical index that describes the direction and magnitude of a relationship. 1. DEVELOPMENTAL NORMS indicates how far along the normal developmental path an individual has progressed.
o Ranges from -1.00 to +1.00 - AGE NORMS, GRADE NORMS, ORDINAL SCALE
o The nearer to 1; the stronger the relationship 2. WITHIN GROUP NORMS individuals performance is evaluated in terms of the performance of the most nearly comparable
o The nearer to 0; the weaker the relationship standardization group.
o The symbol suggests the type of relationship (negative = indirect relationship; positive = direct relationship) a. PERCENTILE
CORRELATIONAL STATISTICS b. STANDARD SCORE
o PEARSON PRODUCT MOMENT CORRELATION 2 variables in interval/ratio scale c. DEVIATION IQ
o SPEARMAN RHO correlates 2 variables in ordinal scale. Also called rank-ordered correlation. 3. NATIONAL NORMS norms on large scale samples
o BISERIAL CORRELATION 1 continuous and 1 artificial dichotomous data (dichotomy in which there are other possibilities a. SUBGROUP NORMS
in a certain category) b. LOCAL NORMS
o POINT BISERIAL CORRELATION 1 continuous and 1 true dichotomous data (dichotomy in which there are only two REGRESSION ( = a + Bx)
possible categories.) Intercept (a) the point at which the regression line crosses the Y axis
o PHI COEFFICIENT 2 dichotomous data; at least 1 true dichotomy Regression Coefficient (b) the slope of the regression line.
o TETRACHLORIC COEFFICIENT 2 dichotomous data; both are artificial dichotomy Regression line best fitting straight line through a set of points in a scatter plot
o COEFFICIENT OF ALIENATION - measure of non association between two variables Standard Error of Estimate measure the accuracy of prediction
o COEFFICIENT OF DETERMINATION - Suggests the percentage shared by two variables. The effect of one variable to MULTIPLE REGRESSION - statistical technique in predicting one variable from a series of predictors. Used to find linear combinations of three or
2
another. r=0.75; r =0.56 more variables. Applicable only when the data are all continuous. (FACTOR ANALYSIS)
STANDARDIZED REGRESSION COEFFICIENT - Also called as beta weights. Tells how much a variable from a given list of variables predict a single
variable.
FACTOR ANALYSIS - Used to study the interrelationships among set of variables.
Factors variables; Also called as principal components
Factor Loading the correlations between the original and the factors; depicted through beta weights.
META-ANALYSIS - Family of techniques used to statistically combine information across studies to produce single estimates of the data under
study.
Effect size the estimate of the strength of relationship or size of differences. Evaluated through correlation coefficient
ITEM ANALYSIS AND ITEM CONSTRUCTION
ITEM WRITING GUIDELINES: ITEM ANALYSIS - general term for a set of methods used to evaluate test items, one of the most important aspects of test construction.
Define clearly I. ITEM DIFFICULTY - measures achievement/ability, defined by the number of people who get correct items. Indicates the easiness of
Generate item pool the test. Should range from 0.30-0.70. Achievement tests make use of multiple choice because it has 0.25 chance of getting the correct
Avoid long items response
Keep level of reading difficulty appropriate for those who will complete the test. a. Optimum item difficulty - suggests the best difficulty for an item based on the number of responses.
Avoid double-barreled items (more than one ideas in one item) i. OID = (chance performance + 1)/ 2
Consider making positive & negative worded items ii. Chance performance performance based on guessing. Can be equated by dividing 1 from the number of
ITEM FORMAT - Form, plan, structure, arrangement, and layout of individual test items. distractors.
I. SELECTED RESPONSE FORMAT select a response from a set of alternative responses. b. Item difficulty index - value that describes the item difficulty for an ability test.
a. DICHOTOMOUS FORMAT - offers 2 alternatives for each item. ADVANTAGE: simplicity, easy administration, quick score, c. Item endorsement index - value that describes the percentage of individuals who said endorsed an item in a personality
no neutral response. DISADVANTAGE: needs more items, 50% chance of getting the correct answer; sample can test.
memorize responses d. Omnibus spiral format - Items in an ability test are arranged into increasing difficulty.
b. POLYCHOTOMOUS - has more than 2 alternatives. Ex. multiple choice. i. Give away items presented near the beginning of the test to spur motivation and lessen test anxiety.
i. Question - stems II. ITEM RELIABILITY - Indicates the internal consistency of a test. The higher the index; the higher the internal consistency.
ii. Correct choice - keyed response a. (Item Reliability) = (SD of the item) x (item-total correlation)
iii. Distractors - incorrect choices. b. Factor analysis can also be used to determine which items has more load for the whole test.
iv. Cute distractors - less likely to be chosen, may affect the reliability of the test III. ITEM VALIDITY - indication of the degree to which a test is measuring what it purports to measure. Higher item-validity index; the
c. LIKERT FORMAT - requires the respondent to indicate the degree of agreement with a particular attitudinal question. higher the criterion related validity for the test.
Superior item format. Uses factor analysis. Can be 5-4/6 choice format *without neutral point*. Negative items are a. Item Validity = (item standard deviation) x (correlation of item and criterion)
reversed score then summed up all scores. IV. ITEM DISCRIMINABILITY - How well an item performs in relation to some criterion. How adequately an item separates high scorers
d. CATEGORY - asked to rate a construct from 1-10; 1-lowest and 10-highest. from low scorers on the entire test. Limits at 0.30 discrimination index. The higher the d the more high scorers answering the item
e. CHECKLIST - a subject receives a long list of adjectives and indicates whether each one is characteristic of himself or correctly
herself a. Extreme group method compares people who have done well with those who have done poorly on a test
f. QSORT - requires respondents to sort a group of statements into 9 piles. b. Point biserial correlating dichotomous and continuous data. Correlates whether those who got an item correct tends to
g. GUTTMAN SCALE - Items are arranged from weaker to stronger expressions of attitude, belief, or feeling being measured. have high scores as well
II. COMPLETION ITEMS complete a set of stimuli to complete a certain item. V. ITEM CHARACTERISTIC CURVE - Graphic representation of item difficulty and discrimination. Usually plots the scores at x-axis then p
a. ESSAY ITEMS - samples need to respond to a question by writing a composition; used to determine the depth of and d on the y-axis.
knowledge of the respondent. VI. ITEMS FOR CRITERION REFERENCE TEST - frequency polygon is created after the test given to two groups; one group that is exposed to
learning unit, another group that is not exposed to learning unit
EQUAL APPEARING INTERVAL
a. Antimode-the score with the lowest frequency
Described by Thurstone
b. Determination of cut score (passing score) for a criterion referenced test.
Scale wherein + and items are present VII. DISTRACTOR ANALYSIS
Adds all responses in order to transform it into interval scale. VIII. ISSUES AMONG TEST ITEMS
Uses direct estimation scaling a. ITEM FAIRNESS - Degree of an item is biased.
o Direct estimation scaling - Transformation of a scale to other scales is possible due to computable value of the mean i. Biased Test Items items that favor one particular group of examinees. Can be tested using inferential
o Indirect estimation scaling - Cannot be transformed to other scales because the mean is not present. statistics among groups.
COMPUTER ADAPTIVE TESTING - Also called as computer assisted testing. Interactive computer-administered test-taking process where in items b. QUALITATIVE ITEM ANALYSIS - Involve exploration of the issues through verbal means such as interviews and group
presented to the test taker are based in part on the test takers performance on previous items discussions conducted with test takers and other relevant parties
ITEM BANK relatively large and easily accessible collection of test questions c. THINK OUT LOUD ADMINISTRATION - Allows test takers (during standardization) to speak their mind while taking the
ITEM BRANCHING ability of the computer to tailor the content and order of presentation of test items on the basis of response to test. Used for shedding light to the test takers thought process during the administration of the test.
previous item d. EXPER PANELS - Guide researchers/test developers in doing sensitivity review (especially in cultural issues)
SCORING ITEMS i. Sensitivity review a study of test items typically to examine test bias, presence of offensive language and
I. CUMULATIVE MODEL the higher the score on the test, the higher the test taker is on the ability, trait, or other category. stereotypes
II. CLASS SCORING/CATEGORY SCORING test taker response earn credit toward placement in a particular class or category with other
test takers whose pattern of responses is similar in some way. Most useful in diagnostic tests
III. IPSATIVE SCORING compares a test takers score on one scale within a test to another scale within that same test
TEST DEVELOPMENT - umbrella term that goes into the process of creating a test.
I. TEST CONCEPTUALIZATION - wherein idea for a particular test is conceived. Following are determined: Construct, Goal, User, Taker, Administration, Format, Response, Benefits, Costs, Interpretation. Determination whether the test would be Norms-Referenced or Criterion-Referenced
a. Pilot work - May be in the form of interview in determining appropriate item for the test
II. TEST CONSTRUCTION writing test items, formatting items, scoring rules, design and building a test.
a. Scaling process of setting rules for assigning numbers in measurement. Manifested through its item format (dichotomous, polytomous, likert, catergory)
b. Item pool - usually 2 times the intended final form number of items. 3 times is more advisable
III. TEST TRYOUT - administration of a test to a representative sample of test takers under conditions. Issues:
a. Determination of target population
b. Determination of number of samples for test tryout (# of items multiplied to 10)
c. Test tryout should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered.
IV. ITEM ANALYSIS - Entails procedures usually statistical designed to explore how individual test items work as compared to other items in the test and in the context of the whole test. (validity, reliability, item difficulty and discrimination
V. TEST REVISION - Balancing of the weakness and strengths of the test/an item.
a. Norming - Done after the test has been revised into acceptable levels of reliability, validity, and item index.
1
TEST ADMINISTRATION TEST UTILITY
ISSUES IN TEST ADMINISTRATION
The Examiner and the Subject Subject Variables USES OF TEST
Training of the Test Administrator Behavior Assessment Issues Classification Assigning a person to one category rather than another
Mode of Administration Screening refers to quick and simple tests or procedures identify persons who might have special characteristics or needs.
EXAMINER AND THE SUBJECT - Relationship between Examiner and the test taker Placement sorting of persons into different programs appropriate to their needs or skills.
Wechsler Intelligence Scale for Children (WISC) enhanced rapport increased score Selection -refers to a process whereby each person evaluated for a position will be either accepted or rejected for that position
Faulty Response Style Diagnosis and Treatment Planning Determination of abnormal behavior; classify using diagnostic criteria; precursor to
o Acquiescent Response tendency to have increased agreement in responding in a test or interview. Most responses are recommendation of treatment of personal distress.
positive in test items regardless of item content Self Knowledge understanding of individuals intelligence and personality characteristics
o Socially Desirable Response Style Present oneself in favorable or socially desirable way Program Evaluation Systematic assessment and evaluation of educational and social programs
Language of the test taker - Test takers proficient in two or more languages should be tested to the language they are most Research measures variables that suggests correlations and causal relationships
comfortable. UTILITY - Usefulness or practical value of testing efficiency
Race of test taker - There are significant effects from the examiners race to the samples responses. PSYCHOMETRIC SOUNDNESS Tests should be reliable and valid for it to be used. Reliability sets the limit for Validity the upper
TRAINING OF TEST ADMINISTRATOR boundary of validity is reliability
Different assessment procedures require different levels of training. COST Disadvantages, losses, or expenses in both economic and non economic terms associated with testing or non testing
According to research, at least 10 practice sessions are needed to gain competency in scoring WAIS R o ECONOMIC COST monetary expenses (Personnel, test protocols, testing venues, etc.)
MODE OF ADMINISTRATOR o NON ECONOMIC COST intangible loss (Loss of trust from patrons due to unqualified personnel)
Self administered measures shows lower results than psychologist administered. BENEFIT Profits, gains, advantages for testing or non testing
Telephone interviews show better health than self administered interviews. o ECONOMIC BENEFIT monetary benefits (Highly qualified salesperson (extroverted) can reach quotas equivalent to
SUBJECT VARIABLES financial gains)
I. TEST ANXIETY - anxiety based on test performance. (worry, emotionality, lack of self confidence) o NON ECONOMIC BENEFIT Increase in quality and quantity of workers performance
II. ILLNESS - diseases influence test taking behavior and performance (malingerers) UTILITY ANALYSIS - Family of techniques that entail a cost-benfitanalysis designed to yield information relevant to a decision about the usefulness
III. HORMONES - imbalance of hormones affect mood cycles thus affect performance on a test and/or practical value of a tool of assessment
IV. MOTIVATION - required to take testing as occupational requirement tend to have unreliable results Test Comparison
ERRORS OF BEHAVIORAL ASSESSMENT Assessment tools comparison
I. REACTIVITY - Being evaluated increases performance; also called as Hawthorne Effect Addition of test/assessment tools
II. DRIFT - moving away from what one has learned going to idiosyncratic definitions of behavior; this suggests that observers should be Determination of non-testing
retrained in a point of time APPROACH OF UTILITY ANALYSIS
a. CONTRAST EFFECT - tendency to rate the same behavior differently when observations are repeated in the same context. I. EXPECTANCY TABLES shows the percentage of people within specified test-score intervals who subsequently were placed in various
III. EXPECTANCIES - Tendency for results be influenced by what test administrators expect to find. categories of the criterion.
a. Rosenthal Effect the test administrators expected results influence the result of the test. a. TAYLOR-RUSSELL TABLES - Statistical tables once extensively used to provide test users with an estimate of the extent to
b. Golem Effect negative expectations from the test administrator decreases ones performance. which inclusion of a particular test in the selection system would improve selection decisions
IV. RATING ERRORS - judgment resulting from intentional and unintentional misuse of a rating scale i. SELECTION RATIO ratio of number of people to be hired and number of applicants
a. Halo Effect tendency to ascribe positive attribute independently of the observed behavior; suggested by Thorndike ii. BASE RATE Lowest possible percentage of people hired expected to be successful in their job.
b. Leniency Error/ Generosity Error raters tendency to be too forgiving and insufficiently critical b. NAYLOR-SHINE TABLES Indicates the mean difference of the newly selected group and the mean of the standard
c. Severity Error evaluation to be overly critical group/unselected group
d. Central Tendency Error The rater has reluctance in giving ratings at either positive or negative extreme. II. BROGDEN-CRONBACH-GLESER FORMULA (BCG FORMULA) Calculates the dollar amount of a utility gain resulting from the use of a
e. Raters ratings would tend to cluster in the middle of the continuum. particular selection instrument under specified conditions
f. General StandoutishnessPeople tend to judge on the basis of one outstanding characteristic. a. UTILITY GAIN an estimate benefit of using a particular test.
b. PRODUCTIVITY GAIN estimated increase in work output
INTERVIEW - Method of getting information by talk, discussion, or direct question. TYPES OF INTERVIEWS SOURCES OF ERROR IN INTERVIEW
I. DIRECTIVE INTERVIEW - Interviewer directs, guides, and controls the course of the interview. 1. INTAKE INTERVIEWS - Entails detailed questioning about the I. INTERVIEW VALIDITY
II. NONDIRECTIVE INTERVIEW - the interviewee guides the interview process. present complaints a. HALO EFFECT
III. SELECTION INTERVIEW - it was designed to elicit information pertaining an applicants qualifications and capabilities for particular 2. DIAGNOSTIC INTERVIEWS - assignment of DSM b. GENERAL STANDOUTISHNESS
employment duties 3. STRUCTURE - predetermined, planned sequence of questions that c. CULTURAL DIFFERENCES
IV. SOCIAL FACILITATION INTERVIEW - Interviewers serve as a model for the interviewee. an interviewer asks a client d. INTERVIEWER BIAS
PRINCIPLES OF EFFECTIVE INTERVIEW 4. UNSTRUCTURED - no predetermined plan of questions II. INTERVIEW RELIABILITY
I. PROPER ATTITUDE 5. SEMI-STRUCTURED - Usually starts with unstructured followed by a. MEMORY AND HONESTY OF THE INTERVIEWEE
a. INTERPERSONAL INFLUENCE degree to which one person can influence another. structured targeting a diagnostic classification. b. CLERICAL CAPABILITIES OF INTERVIEWER
b. INTERPERSONAL ATTRACTION degree to which people share a feeling of understanding mutual respect similarity and 6. MENTAL STATUS EXAMINATION(MSE) - quick assessment of how MEASURING UNDERSTANDING
the like. the client/patient is functioning at the time of evaluation. LEVEL 1 Little or no relationship to the interviewees
II. RESPONSES TO AVOID 7. CRISIS INTERVIEW - Usually for suicidal or abuse cases response
a. JUDGEMENTAL STATEMENTS evaluating the thoughts, feelings, or actions of another 8. CASE HISTORY INTERVIEW - Discuss developmental stages of the LEVEL 2 Communicates superficial awareness of the
b. PROBING STATEMENTS Demanding more information than the interviewee wishes to provide voluntarily patient meaning of a statement
c. HOSTILE STATEMENTS LEVEL 3 Interchangeable to interviewees statements
d. FALSE ASSURANCE LEVEL 4 Communicates empathy and adds minimal
III. EFFECTIVE RESPONSE information/idea
a. OPEN ENDED QUESTIONS b. SUMMARIZING
LEVEL 5 Communicates empathy and adds major
c. TRANSITIONAL PHASE d. CLARIFICATION RESPONSE
information/idea
e. PARAPHRASING AND RESTATEMENT f. EMPATHY & UNDERSTANDING
2

You might also like