• Reliability can be estimated by comparing different versions of the
same measurement. • Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. • Methods of estimating reliability and validity are usually split up into different types. • How do you assess reliability in research? • 1. Test-retest reliability • The test-retest reliability method in research measures the consistency of results when you repeat the same test on the same sample at a different point in time • If the results of the test are similar each time you give it to the sample group, that shows your research method is likely reliable and not influenced by external factors, like the sample group's mood or the day of the week. • Example: Give a group of college students a survey about their satisfaction with their school's parking lots on Monday and again on Friday, then compare the results to check the test-retest reliability. • 2. Parallel forms reliability • This strategy involves giving the same group of people multiple types of tests to determine if the results stay the same when using different research methods. • If they do, this means the methods are likely reliable because, otherwise, the participants in the sample group may behave differently and change the results. • For this strategy to succeed, it's important that: • Each research method is looking for the same information • The group of participants behave similarly for each test • Example: In marketing, you may interview customers about a new product, observe them using the product and give them a survey about how easy the product is to use and compare these results as a parallel forms reliability test. • 3. Inter-rater reliability • The inter-rater reliability testing involves multiple researchers assessing a sample group and comparing their results. • If most of the results from different assessors are similar, it's likely the research method is reliable and can produce usable research because the assessors gathered the same data from the group. • This is useful for research methods where each assessor may have different criteria but can still end up with similar research results, like: • Observations • Interviews • Surveys • Example: Multiple behavioral specialists may observe a group of children playing to determine their social and emotional development and then compare notes to check for inter-rater reliability. • 4. Internal consistency reliability • Checking for internal consistency in research involves making sure your internal research methods or parts of research methods deliver the same results. • There are two typical ways to make this determination: • Split-half reliability test: You can perform this test by splitting a research method, like a survey or test, in half, delivering both halves separately to a sample group, then comparing the results to ensure the method can produce consistent results. If the results are consistent, then the results of the research method are likely reliable. • Inter-item reliability test: With this assessment, you administer sample groups multiple testing items, like with parallel forms reliability testing, and calculate the correlation between the results of each of the method results. With this information, you calculate the average and use the number to determine if the results are reliable. • Example: You may give a company's cleaning department a questionnaire about which cleaning products work the best, but you split it in half and give each half to the department separately and calculate the correlation to test for split-half reliability. VALIDITY 1. Construct • The adherence of a measure to existing theory and knowledge of the concept being measured. • There are two main types of construct validity. • Convergent validity: The extent to which your measure corresponds to measures of related constructs • Discriminant validity: The extent to which your measure is unrelated or negatively related to measures of distinct constructs • A self-esteem questionnaire could be assessed by measuring other traits known or assumed to be related to the concept of self-esteem (such as social skills and optimism). Strong correlation between the scores for self-esteem and associated traits would indicate high construct validity. 2. Content • The extent to which the measurement covers all aspects of the concept being measured. • To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure. • If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened. • content validity applies to any context where you create a test or questionnaire for a particular construct and want to ensure that the questions actually measure what you intend them to. • A test that aims to measure a class of students’ level of Spanish contains reading, writing and speaking components, but no listening component. Experts agree that listening comprehension is an essential aspect of language ability, so the test lacks content validity for measuring the overall level of ability in Spanish. 3. Face validity • Face validity is about whether a test appears to measure what it’s supposed to measure. • This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing on the surface. • Face validity considers how suitable the content of a test seems to be on the surface. • To assess face validity, you ask other people to review your measurement technique and items and gauge their suitability for measuring your variable of interest. • Ask them the following questions: • Are the components of the measure (e.g., questions) relevant to what’s being measured? • Does the measurement method seem useful for measuring the variable? • Is the measure seemingly appropriate for capturing the variable? • You can create a short questionnaire to send to your test reviewers, or you can informally ask them about whether the test seems to measure what it’s supposed to. • You create a survey to measure the regularity of people’s dietary habits. You review the survey items, which ask questions about every meal of the day and snacks eaten in between for every day of the week. On its surface, the survey seems like a good representation of what you want to test, so you consider it to have high face validity. 4. Criterion • The extent to which the result of a measure corresponds to other valid measures of the same concept. • Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test. • To evaluate criterion validity, you calculate the correlation between the results of your measurement and the results of the criterion measurement. • If there is a high correlation, this gives a good indication that your test is measuring what it intends to measure. • A survey is conducted to measure the political opinions of voters in a region. If the results accurately predict the later outcome of an election in that region, this indicates that the survey has high criterion validity.