You are on page 1of 29
) bo Validity and Reliability in Research oT oOo Mo Celgene Vlei eB] Lom lO SON ea eter) AZ E la Validity implies the extent to which the research instrument measures, what it is intended to measure. \ Itrefers to the ability of the instrument/test to measure what it is supposed to measure . \ Validity looks at accuracy. Without validity, research goes in ‘the wrong direction. Generally, validity is termed to be much more important than reliability. Validity is the accuracy of a measure or the extent to which a score truthfully represents a concept. The question of validity is raised in the context of the three points: 1. The form of the test 2. The purpose of the test 3. The population for whom it is intended AWN eee ai poy Y Aconstruct refers | | ¥ Constructscan | |v they can also be toa concept or be broader characteristic characteristics concepts applied that can’t be of individuals, to organizations f directly observed such as or social groups, | but can be intelligence, such as gender | measured by obesity, job equality, | on O observing other satisfaction, or corporate social ir J indicators that depression; responsibility, or J \ o> are associated freedom of ~ with it speech 0 et Y Example ¥ There is no objective, observable entity called “depression” that we can measure directly. But based on existing psychological research and theory, we can measure depression based on a collection of symptoms and indicators, such as low self-confidence and low energy levels. Se Pepe nets Da ee Basis of Comparison Internal Validity External Validity Meaning Internal Validity is the extent to which the It is the extent to which the research results experiment is free from errors and any can be inferred to world at large. difference in measurement is due to independent variable and nothing else. Concerned with Control Naturalness ‘What is it? Itis a measure of accuracy of the It checks whether the casual relationship experiment discovered in the experiment can be generalized or not. Identifies How strong the research methods are? Can the outcome of the research be applied tothe real world? Describes Degree to which the conclusion is Degree to which the study is warranted to warranted. generalize the result to other context. Used to ‘Address or eliminate alternative Generalize the outcome. explanation for the result. KINDS OF VALIDITY Construct Validity Inference Validity (Measurement) (Studies) Translation Criterion Internal External Face Content Population Ecological Predictive Concurrent | Convergent | Discriminant | The inference validity of a research design is the validity of the entirety (completeness) of the research. It indicates whether one can trust the conclusions or not. The inference validity is further divided into two sub-sections: internal validity and external validity. Internal Validity ¥ Internal validity checks the consistency of the conclusions claimed, especially those related to causality (cause and effect) with the results and design of the research. It tells how well a study is conducted. The internal validity of any research has three conditions: —_+—— v The independent Vv The independent v Any other and dependent variable should extraneous factors variables in the precede the should not explain study should change dependent variable tthe result of the together. in the study. study. Sata MANE? s ES eT External validity is based on the ¥ When we say v that results are Other researchers will assume that & pottery ner generalizable, their results would pena ere we mean that be the same if lo a. 2) when research they use the same fh ae is conducted, methods. group of people. Types of External Validity Population Validity Ecological Validity ¥ Refers to the extent to | | vis often applied in Y ifatest has high Ability to generalize which the findings of a experimental studies ecological validity, it can results from the research study are ‘of human behavior be generalized to other sample to a larger able to be generalized and cognition, such real-life situations, while population. to reallfe setting that asin psychology and tests with low ecological are realistic. related fields. validity cannot. TRANSLATION VALIDITY ¥ Translation validity refers to a subjective evaluation that examines whether the selected measures of the study are similar or different to the subject of the overall desired aim of the study. Its further divided into two types: face validity and content validity. rrr Content validity Face Validity evaluates how well an instrument (like a test) covers ale ce aca a amas all relevant parts of the construct it aims to measure. Here, a construct is a theoretical concept, theme, or idea—in particular, one that cannot usually be measured directly ¥ Example: Content validity in exams : A written exam, tests whether individuals have enough theoretical knowledge to acquire a driver's license. The exam would have high content validity if the questions asked cover every possible topic in the course related to traffic rules. At the same time, it should also exclude all other questions that aren't relevant for the driver's license. measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing on the surface. To have face validity, your measure should be: 1. Clearly relevant for what it’s measuring 2. Appropriate for the participants. 3. Adequate for its purpose <8 8 ITERION VALIDITY Y Evaluates how accurately a test measures the outcome it ‘was designed to measure. An outcome can be a disease, behavior, or performance. Concurrent validity measures tests validity measures those in the future. Y Toestablish criterion validity, you need to compare your test results to criterion variables. Criterion variables are often referred to as a “gold standard” measurement. They comprise other tests that are widely accepted as valid ¢ measures of a construct. t ¥ Example: Criterion validity A researcher wants to know whether a college entrance a da da exam is able to predict future academic performance. First- semester GPA can serve as the criterion variable, as it is an accepted measure of academic performance KINDS OF CRITERION VALIDITY Predictive Concurrent Convergent Discriminant Y feferstothe ability ofa | |v Also called criterion- Y Measuring the same Y Discriminant pesconcaer related concurrent concept with very validity shows you that ap Seeger Bree validity. different methods. ‘two tests that a future outcome. Here, | | ¥ Concurrent validity Y If different methods are not supposed to be a cleo can Le.) ‘means that your test yields the same result, related are, in fact, penenoy petiucroanse, measures the same way then convergent validity unrelated. llevan ozone ea ‘as another test in the is supported. axcursatsomepointin | | samme areathathas | | Example: Different i Buample: Predictive already been proven to survey items used to Paneer be valid measure deci Ce ames Y This type of vali making style- closed and established by giving a open-ended. predictive validity when it can accurately identify gow ot pecs we Eneeonicants wiciwll tests- yours and the one par aenel ere already validated and enarnaintcrtioat correlating the scores. Se A Y Content Validity: Does the measure adequately measure the concept? Fon ace wer af] Predictive: Does the measure differentiate individuals in as manner as to help predict a future criterion? Y Face Validity: Do “experts” validate that the instruction measures what its name suggests it measures? ¥ Criterion Validity: Does the measure differentiate in a manner that helps to predict a criterion variable? Y Construct: Does thi concept as theorized? strument tap the Y Concurrent: Does the measure differentiate in a manner that helps to predict a criterion variable concurrently? Y Convergent: Do two instruments measuring the concept correlate highly? ¥ Discriminant: Does the measure have a correlation with a variable that is supposed to be unrelated to this variable? Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable. You measure the temperature of a liquid sample several times under identical conditions. The thermometer displays the same temperature every time, so the results are reliable. A doctor uses a symptom questionnaire to diagnose a patient with a long-term medical condition. Several different doctors use the same questionnaire with the same patient but give different diagnoses. This indicates that the questionnaire has low reliability as a measure of the condition. RYN} a Y Reliability indicates the degree to which a person's test scores are stable — or reproducible — and free from measurement error. ¥ Iftest scores are not reliable, they cannot be valid since they will not provide a good estimate of the ability or trait that the test intends to measure. Reliability is therefore a necessary but not sufficient condition for validity. Y There cannot be validity without reliability. There can be reliability without validity. Avalid instrument is always reliable. A reliable instrument need not be a valid instrument. ¥ Reliability on its own is not enough to ensure validity. Ever reliable, it may not accurately reflect the real situation. The thermometer that you used to test the sample gives reliable results. However, the thermometer has not been calibrated properly, so the result is 4 degrees lower than the true value. Therefore, the measurement is not vali RELIABILITY- INTERNAL CONSISTENCY ¥ 1. Internal consistency assesses the correlation between multiple items Y 2. ina test that are intended to measure the same construct. ¥ 3. You can calculate internal consistency without repeating the test or involving other researchers, 4, so it’s a good way of assessing reliability when you only have one data set. ‘Y 6. When you devise a set of questions or ratings that will be combined into an overall score, v5. Why it’s important ‘Y 8. If responses to different items contradict one another, the test might be unreliable. Y 7. you have to make sure that all of the items really do reflect the same thing. Stability ~——_-__ KINDS OF RELIABILITY (ACCURACY IN MEASUREMENT) Consistency a Test-Retest Parallel-Form Cae Inter-Rater Reliabil Relial Reliability BY, 2 Reliability v ¥ The reliability ¥ The reliability ¥ Atest of V The consistency | |v Reflects the coefficient coefficient consistency of of the judgement correlation obtained with obtained by respondent's of several raters between two repetition of an two responses to con how they see a halves of an identical comparable all the items in phenomenon or instrument. measure on a set of a measure interpret some second measures. responses. occasion. Test-Retest Reliability ¥ The consistency of a measure across time: do you get the same results when you repeat the measurement? ¥ Agroup of participants complete a questionnaire designed to measure personality traits. Y If they repeat the questionnaire days, weeks or months apart and give the same answers, this indicates high test-retest reliability. eas UMM LOS TYPES OF RELIABILITY Parallel-form Reliability ¥ Parallel forms reliability relates to a measure that is obtained by conducting assessment of the same phenomena with the Participation of the same sample group via more than one assessment method. ¥ Example: The levels of employee satisfaction of ABC Company may be assessed with questionnaires, in-depth interviews and focus groups and results can be compared. Parallel-forms Reliability i! : Customer ¢y TYPES OF RELIABILITY Inter-Rater Reliabil \/tecunoseommscosmn | Mter-rater Reliability or observers: do you get the same results 0 1 : 10 U | i when different people conduct the same measurement? Y Based on an assessment criteria checklist, five examiners submit substantially different results for the same student project. This indicates that the assessment checklist has low inter-rater reliability (for example, because the criteria are too subjective). Internal Consistency v The consistency of the measurement itself: do you get the same results from different parts of a test that are designed to measure the same thing? se ¥ You design a questionnaire to measure self- esteem. If you randomly split the results into two halves, there should be a strong correlation between the two sets of results. If the two results are very different, this indicates low internal consistency. Split-half Reliability ¥ split-half reliability as another type of internal consistency reliability involves all items of a test to be ‘spitted in half’. Splithalf refers to determining a correlation | between the first half of the measurement and the second half of the measurement. VARIOUS TYPES OF RELIABILITY Types of Reliability ‘Measures the Consistency of. ‘Test-Retest Reliability v The same test over time. ¥ Same people, different times. Inter-Rater Reliability | v The same test conducted by different people. Y Different people, same test. Parallel Forms ¥ Different versions of a test which are designed to be equivalent. Y Different people, same time, different test. Internal Consistency ¥ The individual items of a test ¥ Different questions, same construct. VARIOUS TYPES OF RELIABILITY Types of Reliability ‘What it is? How You Do It? What is My Methodology? Test-Retest A measure of ‘Administer the same Measuring property that you expect to test/measure at two different stay the same over time. times to the same group of participants. Parallel Forms A measure of Administer two different forms |v Multiple researchers making equivalence of the same test to the same observations or rating about the same group of participants. topic. Inter-Rater A measure of Have two raters rate behaviours | v Using two different tests to measure the agreement and then determine the amount | same thing. of agreement between them. Internal A measure of how Correlate performance on each | ¥ Using a multi-item test where all the Consistency consistently each item with overall performance items are intended to measure the item measures the same underlying concept. across participants. same variable, Basis of Comparison Validity Reliability ‘Meaning Validity implies the extent to which the Reliability refers to the degree to which research instrument measures, what it is ‘assessment tool produces consistent results, intended to measure. when repeated measurements are made. Main Focus Validity mainly focuses on the outcome. Reliability mainly focuses on maintaining consistent result Influencing Factors Influencing factors for validity are: process, | v Influencing factors for reliability are: test purpose, theory matters, logical length, test score variability, heterogeneity, implications, etc. ete. Result derivation Validity requires more research and is more | ¥ Comparatively simpler and producing difficult to attain. quicker result is reliability. Requirement Validity is not a requirement for reliability. Reliability s a requirement for validity. Tools Validity considers “Precision”. Reliability considers “Consistency and Repeatability”. Utility The test is completely useless if the findings | v The test is not very useful if the result are “Invalid”. cannot be reproduced. LATIONSHIP BETWEEN VALIDITY AND RELIABILITY Y Valid test should be reliable (if measuring validity). ¥ Reliability tests may not be valid. Validity is more important than reliability (according to the courts) Y Tobe useful, an instrument(test, scale) must be both reasonably reliable and valid Y Ithelps the Aim for validity first, and then try make the test more reliable little by little. management to identify the actual potential of employees. ¥ With this in ming, it can be helpful to conceptualize the following four basic scenarios for the relation between reliability and validity: ¥ Reliable (consistent) and | | V Unreliable (not Y Unreliable (not ¥ Reliable (consisten pore coneeeeay not valid (measures consistent) and not consistent) and and valid (measures , something consistently, valid (inconsistent valid (measures what what it's meant to ‘ ; : but it doesn't measure measure which doesn't its meant to measure, measure, ie, astable ; : what its meant to measure what its meant i., an unstable construct) measure) to measure) construct) [eserstacti Pere ery eee aor VARIOUS TYPES OF MEASUREMENT SCALES Particulars Nominal Scale Ordinal Scale interval Scale Ratio Scale Characteristics ¥ Description Y Order ¥ Distance Y Description, Order, Distance and Origin. ‘Sequential ¥ Not Applicable ¥ Applicable ¥ Applicable ¥ Applicable Arrangements Fixed Zero Point | V Not Applicable ¥ Not Applicable ¥ Not Applicable ¥ Applicable Multiplication and | ¥ Not Applicable ¥ Not Applicable ¥ Not Applicable Y Applicable Division ‘Addition and ¥ Not Applicable ¥ Not Applicable ¥ Applicable ¥ Applicable ‘Subtraction Difference between |v Non-Measurable | ¥ Non-Measurable | ¥ Measureable Y Measureable Variables ‘Mean ¥ Not Applicable ¥ Not Applicable ¥ Applicable Y Applicable VARIOUS TYPES OF MEASUREMENT SCALES Scale Description Example ‘Type of Data Mathematical Operation Nominal | Data consists of name or categories. |¥ Number assignedtoa | v Discrete Counting and % No ordering scheme is possible. runner in a race. Calculation ¥ Social Security Number, Ordinal | ¥ Data is arranged in some order but | ¥ Rankorder of runners | v Discrete Y Counting and % (Ranking) | differences between values cannot ina race. Calculation be determined or are meaningless. Interval | Data is arranged in order and ¥ Temperature of three | ¥ Continuous | v Addition and differences can be found. However, metal parts were 200 F, Subtraction there is no inherent starting point 300 F and 600 F. Note and ratios are meaningless. three times of 200 Fis not same as 600 F. Ratio | v Anextension of the interval level that | “ Product A cost 300 and |v Continuous | “ Addition, includes an inherent Zero starting Product 8 cost 600. Subtraction, point Note that 600 is twice Multiplication, Y Both differences and ratios are as much as 300. Division and all meaningful. statistical technique.

You might also like