• Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring).

• A valid form of assessment is one which measures

what it is supposed to measure.

Content validity
 Content validity deals with whether the assessment

content and composition are appropriate, given what is being measured.
 Example; Researchers aim to study mathematical learning

and create a survey to test for mathematical skill. If these researchers only tested for multiplication and then drew conclusions from that survey, their study would not show content validity because it excludes other mathematical functions.

Content validity VS Face Validity
 Content validity is carefully evaluated, whereas face

validity is a more general measure, and the subjects often have input.

Face Validity
 face validity only means that the test looks like it

works.  It does not mean that the test has been proven to work.

Validity related to the criteria
 Criterion related validity, also referred to as

instrumental validity, is used to demonstrate the accuracy of a measure or procedure by comparing it with another measure or procedure which has been demonstrated to be valid.

 Concurrent Validity

-occurs when the criterion measures are obtained at the same time as the test scores. -Example ; testing a group of students for intelligence, with an IQ test, and then performing the new intelligence test a couple of days later would be perfectly acceptable.

 Predictive Validity

-involves testing a group of subjects for a certain construct, and then comparing them with results obtained at some point in the future.

-Examples; college entrance testing. When students apply to colleges, they are usually required to submit test scores from examinations such as the SAT or the ACT. These scores are used as a basis for comparison, with evaluators looking at the performance of students who have had similar tests in the past.

Construct Validity
 The extent to which an assessment corresponds to other variables, as predicted by some rationale or theory.
 Construct Validity is valuable in social sciences, where there is a lot of subjectivity to concepts. Often, there is no accepted unit of measurement for constructs and even fairly well known ones, such as IQ, are open to debate.

Factors that influence the validity
 History: Outside events occurring during the course of the

experiment or between repeated measures of the dependent variable may have an influence on the results. This does not make the test itself any less accurate.

 Maturation: Change due to aging or development, either between

or within groups.

 Instrumentation: The reliability of the instrument may change in

calibration (if using a measuring device) or from change in human ability to measure differences (due to fatigue, experience, etc).

 Testing: Experience of taking test has an influence on

results.  Experimenter bias: Expectations of an outcome may cause the experimenter to view data in a different way.

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly.

 Reliability has to do with the consistency of the


 

Test-retest method(consistency over time) Equivalent form Inter-rater Reliability (consistency between raters) Internal Consistency (consistency of the item) Split Half Method

Test-Retest Reliability
 This kind of reliability is used to assess the consistency

of a test across time. This type of reliability assumes that there will be no change in the quality or construct being measured.  Test-retest reliability is best used for things that are stable over time, such as intelligence.

Form equivalence
 Two different forms of test, based on the same content,

on one occasion to the same examinees.  After alternate forms have been developed, it can be used for different examinees. A examinee who took Form A earlier could not share the test items with another student who might take Form B later, because the two forms have different items.

Inter-rater Reliability
 This type of reliability is assessed by having two or more

independent judges score the test.  The scores are then compared to determine the consistency of the raters estimates. One way to test inter-rater reliability is to have each rater assign each test item a score.  For example, each rater might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two rating to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observations falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate.

Internal Consistency Reliability
 This form of reliability is used to judge the consistency

of results across items on the same test.  Essentially, you are comparing test items that measure the same construct to determine the tests internal consistency.  When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.


A test given and divided into halves and are scored separately, then the score of one half of test are compared to the score of the remaining half to test the reliability (Kaplan & Saccuzzo, 2001). Why use Split-Half?  Split-Half Reliability is a useful measure when impractical or undesirable to assess reliability with two tests or to have two test administrations (because of limited time or money) (Cohen & Swerdlik, 2001).

How do I use Split-Half?  1st-divide test into halves. The most commonly used way to do this would be to assign odd numbered items to one half of the test and even numbered items to the other, this is called, Odd-Even reliability.  2nd- Find the correlation of scores between the two halves by using the Pearson r formula.  3rd- Adjust or reevaluate correlation using Spearman-Brown formula which increases the estimate reliability even more. The longer the test the more reliable it is so it is necessary to apply the Spearman-Brown formula to a test that has been shortened, as we do in split-half reliability (Kaplan & Saccuzzo, 2001).

Factors Affecting Reliability
 Administrator Factors  Number of Items on the instrument  The Test Taker  Heterogeneity of the Items and Group Members  Length of Time between Test and Retest

Administrator Factors
• Poor or unclear directions given during administration or inaccurate scoring can affect reliability.
For Example - say you were told that your scores on being social determined your promotion. The result is more likely to be what you think they want than what your behavior is.

Number of Items on the Instrument
• The larger the number of items, the greater the chance for high reliability. For Example -it makes sense when you ponder that twenty questions on your leadership style is more likely to get a consistent result than four questions.

The Test Taker
For Example -If you took an instrument in August when you had a terrible flu and then in December when you were feeling quite good, we might see a difference in your response consistency. If you were under considerable stress of some sort or if you were interrupted while answering the instrument questions, you might give different responses.

 Heterogeneity of the Items -- The greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients.  Heterogeneity of the Group Members -- The greater the heterogeneity of the group members in the preferences, skills or behaviors being tested, the greater the chance for high reliability correlation

 The shorter the time, the greater the chance for high reliability correlation coefficients.  As we have experiences, we tend to adjust our views a little from time to time. Therefore, the time interval between the first time we took an instrument and the second time is really an "experience" interval.  Experience happens, and it influences how we see things. Because internal consistency has no time lapse, one can expect it to have the highest reliability correlation coefficient.

Length of Time between Test and Retest

Relationship between validity and reliability.be considered valid unless the A test cannot
measurement resulting from it are reliable. Likewise result from a test can be reliable and not necessarily valid.