You are on page 1of 10

Test validity concerns the test and assessment procedures used in psychological and educational testing, and the

extent to which these measure what they purport to measure. “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses [1] of tests.” Although classical models divided the concept into various "validities" (such as content [2] validity, criterion validity, and construct validity), the currently dominant view is that validity is a single [3] unitary construct. Validity is generally considered the most important issue in psychological and educational [4] [3] testing because it concerns the meaning placed on test results. Though many textbooks present [5] validity as a static construct, various models of validity have evolved since the first published [6] recommendations for constructing psychological and education tests. These models can be categorized into two primary groups: classical models, which include several types of validity, and modern models, which present validity as a single construct. The modern models reorganize classical "validities" into [3] [1] either "aspects" of validity or types of validity-supporting evidence

Types of validity
Explanations > Social Research > Design > Types of validity Construct | Content | Internal | Conclusion | External | Criterion | Face | Threats | See also

In a research project there are several types of validity that may be sought. In summary:  Construct: Constructs accurately represent reality. o Convergent: Simultaneous measures of same construct correlate. o Discriminant: Doesn't measure what it shouldn't. Internal: Causal relationships can be determined. Conclusion: Any relationship can be found. External: Conclusions can be generalized. Criterion: Correlation with standards. o Predictive: Predicts future values of criterion. o Concurrent: Correlates with other tests. Face: Looks like it'll work.

   

Construct validity
Construct validity occurs when the theoretical constructs of cause and effect accurately represent the real-world situations they are intended to model. This is related to how well the experiment is operationalized. A good experiment turns the theory (constructs) into actual things you can measure. Sometimes just finding out more about the construct (which itself must be valid) can be helpful. Construct validity is thus an assessment of the quality of an instrument or experimental design. It says 'Does it measure the construct it is supposed to measure'. If you do not have construct validity, you will likely draw incorrect conclusions from the experiment (garbage in, garbage out). Convergent validity

for example a simple addition does not test the whole of mathematical ability. Content validity is related very closely to good experimental design. A criterion may well be an externally-defined 'gold standard'.Convergent validity occurs where measures of constructs that are expected to correlate do so. External validity External validity occurs when the causal relationship discovered can be generalized to other people. survey. this demonstrates construct validity by using multiple methods (eg. such as in the use of random assignment of treatments. Achieving this level of validity thus makes results more credible. Convergent validity and Discriminant validity together demonstrate construct validity. Nomological network Defined by Cronbach and Meehl. observation. Internal validity Internal validity occurs when it can be concluded that there is a causal relationship between the variables being studied. this is the set of relationships between constructs and between consequent measures. test) to measure the same set of 'traits' and showing correlations in a matrix. Multitrait-Multimethod Matrix (MTMM) Defined by Campbell and Fiske. Criterion-related validity is related to external validity. Discriminant validity Discriminant validity occurs where constructs that are expected not to relate do not. such that it is possible to discriminate between these constructs. The relationships between constructs should be reflected in the relationships between measures or observations. However in practice this is seldom likely. Criterion-related validity This examines the ability of the measure to predict a variable that is designated as a criterion. A high content validity question covers more of what is sought. The perfect question gives a complete measure of all aspects of what is being investigated. Correct sampling will allow generalization and hence give external validity. Samples should be both large enough and be taken for appropriate target groups. Convergence and discrimination are often demonstrated by correlation of the measures used within constructs. This is similar to concurrent validity (which looks for correlation with other tests). This may be positive or negative correlation. where blocks and diagonals have special meaning. A danger is that changes might be caused by other factors. . A trick with all questions is to ensure that all of the target content is covered (preferably uniformly). It is related to the design of the experiment. times and contexts. This includes measuring the right things as well as having an adequate sample. Conclusion validity Conclusion validity occurs when you can conclude that there is a relationship of some kind between the two variables being examined. Content validity Content validity occurs when the experiment provides adequate coverage of the subject being studied.

a political poll intends to measure future voting intent. In this article we look at a critically important aspect: the assessment of learning. Why is this? Because they want to pass the examination. it is never sufficient and requires more solid validity to enable acceptable conclusions to be drawn. Concurrent validity This measures the relationship between measures made with existing tests. The existing tests is thus the criterion. This includes correlation with measurements made with different instruments.Predictive validity This measures the extent to which a future level of a variable can be predicted from a current measurement. Insufficient data collected to make valid conclusions. Face validity Face validity occurs where something appears to be valid. So far in this series we have looked at how people teach and learn. and we have considered the methods that are available for us to use when we teach. Why is it so important? Go to: Assessment is the Heart of Learning Assessment drives learning. Operation of experiment not rigorous. We have discussed how we should go about planning a curriculum. Too great a variation in data (can't see the wood for the trees). Subjects giving biased answers or trying to guess what they should say. Threats Validity as concluded is not always accepted by others and perhaps rightly so. Experimental method not valid. This of course depends very much on the judgment of the observer. of course! There is always too much to learn. Complex interaction across constructs. In any case. Students take great trouble to find out exactly what the examination will be like. Measures often start out with face validity as the researcher selects those which seem likely prove the point. For example a measure of creativity should correlate with existing measures of creativity. so it makes sense to concentrate on what you need to . Inadequate selection of target subjects. Measurement done with too few measurement variables. Typical reasons why it may not be accepted include:           Inappropriate selection of constructs or measures. Measurement done in too few contexts. College entry tests should have a high predictive validity with regard to final exam results. For example.

The same is true of assessment – we assess each domain in a . Assessment which is done in this way. practically? It means we have to plan our assessment very carefully. We saw that each domain is taught in a different way. as soon as possible. on the other hand. is called formative – we are ‘forming’ or ‘improving’ the students. they will study each clinical problem in such a way that they understand it well enough to diagnose and manage it. If there is no practical in the exam the students will stay out of the clinical areas to spend all their time with their books. Perhaps you want to test your students. That it why teachers should always discuss exams with students afterwards. while the students are still learning. This means that it tests what it is supposed to test. But there are also other reasons for assessment:  Assessment is very important for our students. This second method of testing is valid. We may want our students to be able to make diagnoses – but if our tests only test facts. A better way is to stand by and watch them while they do it with a patient – then you will really know if they can do it. and their exam scripts. because it shows them where they are falling short. You can ask them to write short notes on how to use a Schiotz tonometer – but that will not tell you if they can really do the job. For the same reason. Your method of testing is not valid. to show them what the right answers were. REMEMBER: ASSESSMENT DRIVES LEARNING! What does this mean. students must be given their marks. One of the reasons for our final examination of students is to make sure that they are safe. which is done at the end of a period of teaching.know to pass the exam. they know that the test consists of clinical problems to diagnose and manage.  We are training health workers to do a job. the students will quickly learn just to memorise facts. This kind of assessment. and where they made mistakes. But if they know there is going to be an OSPE they will spend time with patients to make sure they have mastered all the skills. To protect society. In an earlier article we discussed the domains of learning. Go to: Why do we Assess Students? The main reason is obvious: we want to see if they have learnt what we have taught them. in such a way that our students will learn what we want them to learn. If we want students to learn how to manage patients. to see if they can measure intraocular pressure. If. Society expects us to do a good job! Go to: Assessment should be Valid Good assessment is valid. is called summative – it is a ‘summary’ of what the students have learnt. our exam questions must be patient case studies in which we ask students what their management will be. we should only send out students who are safe – who know their work well enough not to harm anybody.

Such assessment is not valid. Finally: teachers often spend more time on preparing lessons and teaching them. or use another examiner. You can make any form of assessment more reliable by giving a little thought to the matter. An OSPE is more valid than oldfashioned practicals which use different patients for different students. as long as you tell other people what you mean. It doesn't matter which word you use. Go to: Assessment should be Reliable Good assessment is reliable. Valid assessment should be straightforward. the mark will be the same.different way. ‘Assessment’ or ‘evaluation’? These two words have different meanings for different people. In the tablebelow you will find examples of how we should assess the learning of our students. In the UK people ‘assess’ students to find out if they have learnt. In the United States the two words are often used the other way around – they ‘evaluate’ students and ‘assess’ programmes. This means that if we repeat the assessment on the same student at another time. Written exams are more reliable if the markers are guided by a very clear document which shows how marks are allocated for each question. than they do on assessing the results. for each domain: If you follow the guidelines in this table. . and should focus on the ‘must know’ and ‘must be able to do’ – the things that are really necessary for day-to-day practice. your assessment is likely to be valid – it will test what it is supposed to test. Practical exams are more reliable if you use a checklist to mark the student's performance. to see if they are effective. Some forms of assessment are more reliable than others. and they ‘evaluate’ programmes. A written exam (where everyone gets the same questions) is generally more reliable than an oral one (where different candidates get asked different questions by different examiners). Any time you spend on improving your assessment will be richly repaid – your students will be better learners as a result. Others like to ask questions about very rare. Some teachers like to ask ‘trick questions’ to catch out their students. very obscure diseases.

E stands for Examination – no surprises there! Good OSPEs are an excellent way of examining skills. (Some people prefer the word Clinical – so that makes their exam an „OSCE‟. Each skill is tested in a separate room. like examining the anterior chamber of the eye. They take a lot of time and preparation. On the other hand they have a number of serious drawbacks:  Examiners tend to use them to assess facts.) Finally. P stands for Practical. They have to be carefully tested for comprehension. all the equipment s/he needs. There may be ten such stations in an OSPE.The next article in this series deals with the resources that teachers and students need. If different students are given different patients to examine. Holistic Rubric Disadvantages .   Advantages and Disadvantages of the Rubric Holistic Rubric Advantages   Quick scoring and they provide an overview of student achievement. For all these reasons MCQs often have low validity. They have become very popular and they are also very easy to mark. followed by a number of answers or options for students to choose from. or it could be a communication skill. Easily obtain a single dimension if that is adequate for your purpose. Each starts at a different station. This means that this exam is practical – it only tests the skills of the students. Watch this space! Multiple Choice Questions – beautiful but deadly? MCQs consist of a leading statement or scenario. like taking a patient's history. Go to: Footnotes What is an OSPE? The OSPE is a special kind of examination that is now commonly used. but so do all practical examinations. in this examination. this could be unfair: some patients and conditions are easier to examine than others. a patient (if necessary). every student gets the same patient – that is why we say it is objective. and after 10–15 minutes a bell rings and they move on to the next one. called a station. rather than patient management  If there are only two or three options. and an examiner with a checklist for doing the marking. What do the letters mean?   O stands for Objective. S stands for Structured. So. students may get marks from guessing  Research has shown that students very often misunderstand them. or consult a textbook. before being given to students to use. and ten students are then examined together. People who write MCQs should receive some form of training first. It could be manual skills. At each station there is a card with clear instructions for the student. Several skills are tested at one time.

having uses across many contexts. Useful feedback for the effectiveness of instruction. Do not provide very detailed information. evaluating. Analytical Rubric Disadvantages    It is more difficult to construct analytical rubrics for all tasks. Scoring tends to be more consistent across students and grades. Narrows the gap between instruction and assessment. Can offer a method of consistency in scoring by clearly defining the performance criteria. in many grade levels and for a wide range of abilities.  Not very useful to help plan instruction because they lack a detailed analysis of a students strengths or weaknesses of a product. Helps students to better understand the nature of quality work. Flexible tool. Potential to open communication with caregivers. Analytical Rubric Advantages     Provides meaningful and specific feedback along multiple dimensions. Motivates students to reach the standards specified. and updating time consuming. Disadvantages of Rubrics in General    Rubrics can also restrict the students mind power in that they will feel that they need to complete the assignment strictly to the rubric instead of taking the initiative to explore their learning. Lower consistency among different raters. For the teacher creating the rubric. students may feel overwhelmed with the assignment. Giving the child more control of their own learning process. Tends to be quite time consuming. they may find the task of developing. testing. If the criteria that is in the rubric is too complex. Advantages of Rubrics in General          Forces the teacher to clarify criteria in detail. Easier for the teacher to share with students and parents about certain strengths and weaknesses. Potential to be transferred into grades if necessary. . and little success may be imminent.

It is a formative type of assessment because it becomes an ongoing part of the whole teaching and learning process. or quality) use a range to rate performance contain specific performance characteristics arranged in levels indicating the degree to which a standard has been met In this module you will create your own rubric for assessing student performance regarding a given objective. and assessment. behavior. A tenth grader would need to write coherent paragraphs in order to earn high marks. therefore. What is a rubric?     A rubric is a scoring guide that seeks to evaluate a student’s performance based on the sum of a full range of criteria rather than a single numerical score. A rubric is a working guide for students and teachers. For example. This involvement empowers the students and as a result. blurs the lines between teaching. A rubric enhances the quality of direct instruction. Articles on the Web and some examples of rubrics will focus your effort and stimulate your creativity. o Authentic assessment is used to evaluate students’ work by measuring the product according to real-life criteria. however. expectations vary according to one’s level of expertise. The performance level of a novice is expected be lower than that of an expert and would be reflected in different standards. their learning becomes more focused and self-directed. The advantages of using rubrics in assessment are that they:       allow assessment to be more objective and consistent focus the teacher to clarify his/her criteria in specific terms clearly show the student how their work will be evaluated and what is expected promote student awareness of about the criteria to use in assessing peer performance provide useful feedback regarding the effectiveness of the instruction provide benchmarks against which to measure and document progress Rubrics can be created in a variety of forms and levels of complexity. a firstgrade author may not be expected to write a coherent paragraph to earn a high evaluation. o Although the same criteria are considered. they can assist in the rubric design process. As students become familiar with rubrics. A rubric is an authentic assessment tool used to measure students’ work. they all contain common features which:    focus on measuring a stated objective (performance. Students themselves are involved in the assessment process through both peer and self-assessment. in evaluating a story. . Authentic assessment. usually handed out before the assignment begins in order to get students to think about the criteria on which their work will be judged.The rubric is one authentic assessment tool which is designed to simulate real life activity where students are engaged in solving real-life problems. learning. The same criteria used to judge a published author would be used to evaluate students’ writing.

Rubrics set standards. The following rubric was created by a group of postgraduate education students at the University of San Francisco. Rubrics help students take responsibility for their own learning. and revisiting the same concepts from different angles improves understanding of the lesson for students. In brief:   Prepare rubrics as guides students can use to build on current knowledge. Rubrics clarify expectations. writing. it is not necessary to create a completely new rubric for every activity. Once a rubric is created. can reuse rubrics for various activities. and to particular details as a model for students. Because the essentials remain constant. For example. re-conceptualizing. Information on the expected quality of the task performed is given to students. Consider rubrics as part of your planning time. foreign languages. When levels are described in clear language. There are many advantages to using rubrics:         Teachers attention Students Students Teachers can increase the quality of their direct instruction by providing focus. drama. Why use rubrics? Many experts believe that rubrics improve students’ end products and therefore increase learning. they know implicitly what makes a good final product and why. Developing a grid and making it available as a tool for students’ use will provide the scaffolding necessary to improve the quality of their work and increase their knowledge. history. science. not as an additional time commitment to your preparation. .Rubrics can be created for any content area including math. what does change is students’ competence and your teaching strategy. Students know in advance what they have to do to achieve a certain level. emphasis. An established rubric can be used or slightly modified and applied to many activities. music. they can be modified easily for various grade levels. have explicit guidelines regarding teacher expectations. but could be developed easily by a group of elementary students. Students use rubrics to help study information the teacher values. When students receive rubrics beforehand. it can be used for a variety of activities. Reviewing. the standards for excellence in a writing rubric remain constant throughout the school year. everyone knows what is required. art. and even cooking! Once developed. Rubrics tell students they must do a careful job. When teachers evaluate papers or projects. can use rubrics as a tool to develop their abilities. they understand how they will be evaluated and can prepare accordingly. The quality of student work will improve.

parents and community members) seeing a rubric and a student score based on that rubric knows what content was mastered by that student. Clarifies quality expectations to students about their assignments. Students are able to self-assess their own work prior to submitting it. Rubrics might need to be continuously revised before it can actually be usable in an easy fashion . because they know what to focus on Possible Disadvantages of Rubrics Development of rubrics can be complex and time-consuming. Students can understand better the rationale and the reason for grades. Using a bad rubric is a waste of time…‖ –Michael Simkins in ―Designing Great Rubrics‖ Analytic vs. Defining the correct set of criteria to define performance can be complex. Helps faculty grade/score more accurately. Rubrics are time-consuming to design. Helps improve student performance. Disadvantages of Rubrics:    Rubrics are hard to design. Using the correct language to express performance expectation can be difficult. Anyone (including colleagues. Holistic Rubrics   Holistic rubric gives a single score or rating for an entire product or performance based on an overall impression of a student’s work Analytical trait rubric divides a product or performance into essential traits or dimensions so that they can be judged separately—one analyzes a product or performance for essential traits Advantages of Rubrics              Helps the grading process become more efficient. Requires faculty to set and define more precisely the criteria used in the grading process. fairly and reliably. Rubrics have value to other stakeholders. Supports uniform and standardized grading processes among different faculty members. ―A rubric is only as useful as it is good. Helps communicating grade between faculty and students.