Psychological Testing and Research Methods

1.
In the language of psychological testing, reliability refers to

a. the lack of systematic errors
b. how well a test measures what it is intended to measure under specified conditions
c. the proportion of total variance that can be attributed to true variance
d. whether or not a tests publisher consistently publishes high quality instruments
2. Detailed information regarding how a particular test was developed is typically found
in
a. the current test catalogue distributed by the test’s publisher
b. the Standards for Educational and Psychological Tests
c. a review of the test published in a journal
d. the test manual
3. The Likert scale is an example of which type of rating scale?

a. content
b. summative
c. categorical
d. paired methods
4. Validity is to ________, as utility is to ___________.

a. usefulness; accuracy
b. accuracy; usefulness
c. consistency; accuracy
d. usefulness, consistency
5. When the selection ratio goes down:
a. none of these
b. hiring become less selective
c. top-down selection policy can become discriminatory
d. competition for the position is likely to increase.
6. Item branching refers to:

a. reusing items in an original test that were originally developed for use in parallel test.
b. statistical efforts to ensure that items translated into foreign languages are of the
same difficulty.
c. administering certain test items on a test depending on the test-takers’ responses to
previous test items.
d. the creation of alternate and parallel forms of tests based on a group of test-takers’
responses to the original test.
7. Critics have argued that projective tests are too

a. brief
b. subjective
c. concrete
d. qualitative
8. A 40-item vocabulary test was administered to a group of students. A second similar

test of vocabulary term was administered to this same group of students approximately
one week later. The researcher reported that the correlation between these two tests
was r = .90. What type of reliability is represented in this example?
a. alternate forms
b. test-retest
c. inter-rater
d. split-half
9. A researcher conducted a study to determine the effects of gender and status on the
perceived credibility of an eyewitness testifying in a trial. Participants watched one of
four video recordings depicting the eyewitness and rated the credibility of the
eyewitness.
What type of design was used in the study?
a. multivariate correlational
b. between-and within-subjects
c. within-subjects
d. between-subjects
10. The standard deviation of a sample test scores is a measure of the

a. normality of the distribution
b. concurrent validity of the test
c. variability of individual scores
d. central tendency of scores
11. Which of the following is NOT an assumption of utility analysis?

a. the value of people and their performance can be estimated
b. psychological tests are always preferred over other means of assessment
c. large amounts of information can be integrated to make good decisions
d. the performance of people in organization can affect organizational viability
12-14. A researcher was interested in whether or not jazz vocals and opera influence
men’s and women’s emotional states. She hypothesized that these types of music
influence men and women differently. In a study investigating this hypothesis, 40 men
and 40 women heard a jazz piece, and 4 men and 4 women heard an operatic piece.
The jazz piece was sung by a man, and the operatic piece was sung by a woman.
Afterward, participants rated themselves on an inventory measuring emotional state.
Higher scores on the inventory indicate positive mood. Results of this study are
presented in the graph below:
12. Which of the following describes the pattern of findings displayed in the graph?
a. Women scored higher than women on the mood inventory regardless of the type of
music they heard.
b. Men who heard the jazz piece and women who heard the operatic piece scored
higher on the mood inventory than those in the other two groups
c. women who heard the jazz piece and the men who heard the operatic piece scored
higher on the mood inventory than those in the other two groups.
d. men scored higher than women on the mood inventory regardless of the type of
music they heard.
13. The researcher concludes from her study that jazz music positively changes men’s
mood and operatic music positively changes women’s moods. Which of the following
invalidates the conclusion?
a. men’s and women’s moods were not measured before exposure to the two types of
music.
b. previous studies have shown that men are less emotional than women
c. men and women were randomly assigned to the groups
d. only one scale was used to measure mood
14. Which of the following is the most serious problem with the methodology of this
research?
a. The sample size was too small to draw a valid conclusion
b. men and women did not listen to both types of music
c. only one type of music should have been used
d. the singers were not the same gender
15.A recent article in an educational journal described a university at which the average
age is 26. This article also mentioned that 38 percent of the students are over 25 years
of age. What can be concluded from this information?
a. the distribution must be skewed
b. the standard deviation must be relatively small
c. the median age must be greater than the mean age.
d. the distribution must be bimodal
16. Conducting a study by analyzing Philippine census data from previous years is an
example of using which of the following research approaches?
a. descriptive
b. surveys
c. case history
d. archival analysis
17-18. Depression is more common among people with insomnia than among those
with satisfactory sleep. To determine the reasons for this relationship, investigators
identified 40 people suffering from both depression and insomnia. For each of these 40,
they paired two other people of the same gender and age who were neither depressed
nor suffering from any other sleep other. One of these was designated the “normal-
sleep-control”, and the other was designated the “yoked-control”. All participants slept in
a laboratory for one week. The normal-sleep control person slept without restrictions.
During that same time, yoked control was permitted to sleep when the depressed-
insomniac person slept but was required to awaken whenever the depressed-insomniac
was awakened.
A valid questionnaire for measuring depression was administered at the end of the one-
week study. Assume that higher scores on the questionnaire reflect greater depressive
symptomatology.
17. What pattern of results on the depression questionnaire would justify the conclusion
that sleeplessness leads to depression?
a. normal sleep control = yoked control < depressed
b. normal sleep control < yoked control = depressed
c. yoked control < normal sleep control = depressed
d. yoked control < normal sleep control < depressed
18. Supposed that the results were consistent with the hypothesis that sleeplessness
does NOT lead to depression. Of the following which would be the most serious
criticism of the study and its conclusion?
a. One week of sleep deprivation may have been adequate to produce depression
b. the study failed to examine other factors that might also contribute to depression
c. the yoked-control group was unnecessary
d. the normal sleep control group was unnecessary
19. A researcher conducted a study to determine the effects of gender and status on the
perceived credibility of an eyewitness testifying in a trial. Participants watched one of
four video recordings depicting the eyewitness and rated the credibility of the
eyewitness.
In order to determine whether gender, as a specific variable, had an effect on perceived
credibility of the eyewitness, which of the following must be significant?
a. the interaction between gender and status
b. the main effect of status
c. the main effect of gender
d. post-hoc analysis of gender
20. Melody exclaims, “I got a C- on the statistics exam, and I was miserable until I
thought how terrible it must be for those who got F’s.” Melody’s attitude is an example of
which of the following?
a. social comparison
b. social anxiety
c. social validation
d. social learning
21. A patient is administered the Minnesota Multiphasic Personality Inventory-2-RF

(MMPI-2-RF) by an experienced clinician. The clinician concludes that the patient has
schizophrenia. The clinician’s diagnosis best supports which of the following additional
conclusions?
a. the patient’s pattern of responses to the MMPI-2-RF resembles that of people who
are known to have schizophrenia
b. a brief interview with the patient would reveal that the patient harbors delusions of
grandeur
c. the clinician’s interpretation of the MMPI-2-RF findings is based on knowledge of
projective testing
d. the patient received a high score on the lie scale of the MMPI-2RF
22. Which of the following tests measures ability, intellect, and knowledge?
a. Wechsler Adult Intelligence Scale -Fourth Edition (WAIS-IV)
b. Minnesota Multiphasic Personality Inventory-2-RF (MMPI-2-RF)
c. Myers-Briggs Type Indicator (MBTI)
d. Strong Interest Inventory
23. Research by Solomon Asch supports which of the following?
a. conformity increases as group size increases from two people to four or five people
b. higher levels of conformity are found in individualistic societies than in collectivistic
societies
c. individual will follow orders to shock innocent strangers
d. the presence of one dissenter in a group is not strong enough to reduce conformity.
24. A group of researchers was interested in learning whether a newly developed exam
would be useful in determining whether a student will be successful in college. The
researchers designed a study in which a students took the new exam prior to entering
college, the student took another exam, which was designed to measure how much
information they had learning during their first year. The score on this exam was then
correlated with the student’s score on the newly developed exam. What type of validity
was being evaluated in the study?
A. predictive
b. discriminant
c. divergent
d. concurrent
25. A psychologist wishes to compare the performance of an experimental group and a

control group on a continuous measure. Which of the following would be the most
typical way to make this comparison?
a. conducting a chi-square test
b. computing a single correlation coefficient
c. conducting a t-test on the two means
d. computing a multiple correlation
26-27. In a study of a new psychopharmacological treatment for clinical depression, 40

participants diagnosed with depression each received four (4) different amounts of a
new medication called Deplow. The first week, they were given a placebo. During the
second week of the study, they took 1mg of Deplow each day. During the third week,
they took 3mg of Deplow each day, and during the fourth week, they took 5mg of
Deplow each day. Although the participants took different amounts of medication each
week, they were not informed about the amount they were taking. The participants also
completed a depression symptom checklist at the end of each week. Results are
presented below. The score on the checklist could range from 0 to 30 indicating severe
depression. Assume statistical significance for differences greater than 3.0
Week of Study Treatment Mean Depression Score
1 Placebo 22.5
2 1 mg 23.2
3 3 mg 19.9
4 5 mg 14.5
26. Which of the following effects is the most serious limitation of this study?
a. ceiling
b. carryover
c. cohort
d. selection
27. What type of design was used in this study?

a. single factor within subjects
b. multifactor between subjects
c. single factor between subjects
d. cross-sectional
28. Dr. Chen is interested in feminist is interested in feminist attitudes of young adult
women in the United States. Consequently, she administered a feminist attitude
questionnaire to a total of 100 young adult women from three universities. The 100
women tested and the number of young adult women in the United States are which of
the, respectively?
A. effect size and population
b. random assignment and random selection
c. independent and dependent variables
d. sample and population
29. In the language of psychological testing and assessment, scoring refers to assigning
evaluative numbers, codes, or statements to performance on:
a. interviews
b. tests
c. all of these
d. tasks
30. If the results of an examination are negatively skewed, the exam questions were
likely:
a. easy
b. biased
c. difficult
d. quite novel in many respects
31. In a distribution that is symmetrical, which of the following is true?

a. the distances from Q1 and Q3 to the median are the same
b. the distances from Q2 and Q3 to the median are the same
c. the distances from Q1 and Q4 to the median are the same
d. the distances from Q1 and Q2 to the median are the same
32. An anchor protocol is:

a. a list of guidelines for a standard test used to ensure that all test-takers are similar in
keyways to the population of the original standardization sample.
b. a model for scoring and a mechanism for resolving scoring discrepancies
c. a statistical procedure in which weights are assigned to each item of a model test to
maximize predictive validity
d. previously developed test with known validity that can be used as a comparison for
newly developed test
33. If a new test was developed to assist a college in selecting applicants, which group
of test-takers should ideally be administered the test items developed used during the
item tryout phase of the new test’s development
a. seniors in high school who were accepted to the college on the basis of criteria other
than the test under development
b. all high school juniors who are college-bound
c. freshmen in college admitted who had taken one or more advanced placement
courses in high school
d. college students who put a hold on their academic studies in order to backpack
through Europe for 1 year or more
34. The higher the item-difficulty index, the _____ the item
a. easier
b. less robust
c. more robust
d. harder
35. the item-validity index is key in determining:

a. criterion-related validity
b. construct validity
c. content validity
d. all of these
36. an item-reliability index provides a measure of a test’s:
a. stability
b. internal consistency
c. test-retest reliability
d. all of these
37. the greater the magnitude of the item-discrimination index:
a. the more valid the test
b. the more people in the higher-scoring group answered the item correctly as
compared with those in the lower-scoring group
c. the more people in the lower-scoring group answered the item correctly as compared
with those in the higher-scoring group
d. the more reliable the test
38. As part of the test developmental process; a test revision may entail:
a. Development of a new edition of a test
b. rewording, deletion, or development of new items
c. rewording, deletion, or development of new items; and development of a new edition
of a test
d. the reprinting of a test
39. which is NOT a typical question that is raised and answered during the test
conceptualization stage of test development?
a. what is the objective if the test?
b. how valid are the items on the test?
c. what types of responses will be required to the test-taker?
d. is there a need for the test?
40. A test developer designs a test for the sole purpose of identifying the most highly
skilled individuals among those tested. During the test revision stage of test
development, the test developer will be particularly interested in:
a. item reliability
b. item discrimination
c. item validity
d. item bias
41. You are interested in developing a test for social adjustment in a college fraternity or
sorority. You begin by interviewing persons who had graduated from college after
having been a member of a fraternity or sorority for at least 2 years. Which stage of test
development best describes the one that you are in?
a. the test-tryout stage
b. the test revision stage
c. the pilot work stage
d. the test construction stage
42. the method of paired comparisons is used to:

a. minimize the opportunity of selecting a socially desirable response
b. provide test-takers with a sufficient number of pairs of choices to express their “true”
opinions
c. provide test-takers with a limited number of pairs of choices in order to lessen testing
time
d. maximize the opportunity of selecting a socially desirable response
43. The statistical tool that is ideally suited for making selection decisions within the
framework of a compensatory model is:
a. expectancy data
b. multiple regression
c. the Brodgen-Cronbach-Glesser formula
d. utility analysis
44. When a cut score is set based on norm-related considerations rather than on the
relationship of test score to a criterion, it is known as:
a. a referential cut score
b. a relative cut score
c. an absolute cut score
d. a fixed cut score
45. If an instructor assigns a grade of “A” to all students who earn 900 or more points
out of a total 1000 points during the semester, 900 represents:
a. the base rate of A-level students
b. the selection ratio
c. the cut score for an A
d. the success rate
46. A correlation coefficient is equal to .30. Using the concept of coefficient

determination, the variance accounted for by chance, error, and other unexplained
factors would be:
a. approximately 91%
b. approximately 30%
c. none of these
d. approximately 3 %
47. The term used to describe the proportion of people in a population who posses a
given characteristic is:
a. sensitivity
b. selection ratio
c. success rate
d. base rate
48. the hit rate is equivalent to:

a. the success rate/base rate of successful performance
b. the number of correct classifications/total number of classifications
c. the miss rate/ the selection ratio
d. the base rate/ the selection ratio
49. “Multiple predictors may be used so that applicants must meet or exceed the cut
score for each predictor before moving to the next round of the selection process.” What
process is being described?
a. compensatory model of selection
b. known-groups selection
c. multiple hurdle selection
d. top down selection
50. Which of the following represents a problem unique to self-report personality tests?
a. all of these
b. respondents may be too “low” on the construct being measured to register on the test
c. the reading ability of respondents may prevent them from responding accurately to
items
d. respondents might be unwilling to reveal something negative about themselves
51. A key difference between concurrent and predictive validity has to do with
a. the magnitude of the reliability coefficient that will be considered significant at the .05
level
b. the magnitude of the validity coefficient that will be considered significant at the .05
level
c. the time frame during which data on the criterion measure is collected
d. none of these
52. To ensure that a test developed for national use is indeed suitable for national use,
test developers:
a. all of these
b. post sample items on the Web to gauge response of different groups
c. have a culturally representative panel of experts review test items
d. employ a culturally representative group of examiners
53. If a time limit is long enough to allow test-takers to attempt all items, and if some
items are so difficult that no test-takers is able to obtain a perfect score, then the test is
referred to as a __________ test
a. reliable
b. valid
c. speed
d. power
54. Traditional measures of reliability are inappropriate for criterion-referenced tests

because variability
a. cannot be determined with criterion-referenced tests
b. is variable with criterion-referenced tests
c. is maximized with criterion-referenced tests
d. is minimized with criterion-referenced tests
55. The standard error of measurement of a particular test of anxiety is 8. A student

earns a score of 60. What is the confidence interval for this test score at the 95% level?
a. 36-84
b. 52-68
c. 40-68
d. 44-76
56. You wish to determine if the student you are evaluating scored higher on a
mathematics test than on reading test. What is statistic(s) would you calculate?
a. the mean of each distribution and index of test difficulty for each test
b. the raw score on each test as well as the mean of each distribution
c. the standard error of the difference between two scores
d. the standard error of measurement for each test score
57. A significant, positive relationship exists between scored on a new test of
intelligence and scores on the fourth edition of the Stanford-Binet intelligence scale.
These data may be viewed as supportive of which type of validity evidence for the new
test?
a. discriminant evidence of construct validity
b. criterion-related validity
c. content validity
d. convergent evidence of construct validity
58. Which of the following increases the power of a statistical test?

a. changing from a two-tailed to a one-tailed test
b. changing alpha from .05 to .01
c. decreasing the sample size from N=100 to N=75
d. using a smaller critical area in the distribution of sample means

Psychological Testing and Research Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Psychological Testing and Research Methods

Uploaded by

Copyright:

Available Formats

1.

In the language of psychological testing, reliability refers to

3. The Likert scale is an example of which type of rating scale?

4. Validity is to , as utility is to ___.

6. Item branching refers to:

7. Critics have argued that projective tests are too

8. A 40-item vocabulary test was administered to a group of students. A second similar

10. The standard deviation of a sample test scores is a measure of the

11. Which of the following is NOT an assumption of utility analysis?

21. A patient is administered the Minnesota Multiphasic Personality Inventory-2-RF

25. A psychologist wishes to compare the performance of an experimental group and a

26-27. In a study of a new psychopharmacological treatment for clinical depression, 40

27. What type of design was used in this study?

31. In a distribution that is symmetrical, which of the following is true?

32. An anchor protocol is:

35. the item-validity index is key in determining:

42. the method of paired comparisons is used to:

46. A correlation coefficient is equal to .30. Using the concept of coefficient

48. the hit rate is equivalent to:

54. Traditional measures of reliability are inappropriate for criterion-referenced tests

55. The standard error of measurement of a particular test of anxiety is 8. A student

58. Which of the following increases the power of a statistical test?

You might also like