You are on page 1of 13

RESEARCH INSTRUMENT AND DATA COLLECTION IN LANGUANGE LEARNING

VALIDITY AND RELIABILITY OF RESEARCH DATA, TRIANGULATION OF THE


STUDY
A. Research Instrument
The term research instrument refers to any tool that you may use to collect or
obtain data, measure data and analyse data that is relevant to the subject of your research.
Research instruments are often used in the fields of social sciences and health sciences.
These tools can also be found within education that relates to patients, staff, teachers and
students. The format of a research instrument may consist of questionnaires, surveys,
interviews, checklists or simple tests. The choice of which specific research instrument
tool to use will be decided on the by the researcher. It will also be strongly related to the
actual methods that will be used in the specific study.
1. Research Instrument Examples
There are many examples of research instruments. The most common ones are
interviews, surveys, observations, and focus groups. Let's break them down one by
one.
a. Research instrument: Interviews
The interview is a qualitative research method that collects data by asking
questions. It includes three main types: structured, unstructured, and semi-
structured interviews.
- Structured interviews 
include an ordered list of questions. These questions are often closed-
ended and draw a yes, no or a short answer from the respondents. Structured
interviews are easy to execute but leave little room for spontaneity.
- Unstructured interviews are the opposite of structured interviews. Questions
are mostly open-ended and are not arranged in order. The participants can
express themselves more freely and elaborate on their answers.
- Semi-structured interviews are a blend of structured and unstructured
interviews. They are more organised than unstructured interviews, though not
as rigid as structured interviews.
Compared to other research instruments, interviews provide more reliable results
and allow the interviewers to engage and connect with the participants. However,
it requires experienced interviewers to drive the best response from the
interviewees.
Tools used in interviews may include:
- Audio recorder (face-to-face interview)
- Cam recorder & video conferencing tools (online interview)
b. Research instrument: Survey
Survey research is another primary data collection method that involves
asking a group of people for their opinions on a topic. However, surveys are often
given out in paper form or online instead of meeting the respondents face-to-face.
The most common form of a survey is a questionnaire. It is a list of
questions to collect opinions from a group. These questions can be close-ended,
open-ended, pre-selected answers, or scale ratings. Participants can receive the
same or alternate questions.
The main benefit of a survey is that it is a cheap way to collect data from a
large group. Most surveys are also anonymous, making people more comfortable
sharing honest opinions. However, this approach does not always guarantee a
response since people tend to ignore surveys in their email inboxes or in-store.
There are many types of surveys, including paper and online surveys.
c. Reseach instrument: Observation
Observation is another research instrument for marketers to collect data. It
involves an observer watching people interacting in a controlled or uncontrolled
environment.

An example is watching a group of kids playing and seeing how they interact,
which kid is most popular in the group, etc.

Observation is easy to execute and also provides highly accurate results.


However, these results might be subjected to observer bias (the observers'
opinions and prejudice) which lowers their fairness and objectivity. Also, some
types of observations are not cheap.

d. Research Instrument: Focus groups

Focus groups are similar to interviews but include more than one participant. It is
also a qualitative research method which aims to understand customers' opinions
on a topic.

Focus groups often consist of one moderator and a group of participants.


Sometimes, there are two moderators, one directing the conversation and the other
observing.

e. Research Instrument: Existing data


Unlike the others, existing or secondary data is an instrument for secondary
research. Secondary research means using data that another researcher has
collected.
Secondary data can save a lot of research time and budget. Sources are also
numerous, including internal (within the company) and external (outside the
company) sources.
f. Research Instrument Design

Research instrument design means creating research instruments to obtain the


most quality, reliable, and actionable results. It is an intricate process that requires
a lot of time and effort from the researchers.

A few things to keep in mind when designing research instrument1 :

1. Validity. means how well the participants' answers match those outside the
study.
2. Reliability. means whether the research method will produce similar results
multiple times.
3. Replicability. means whether the research results can be used for other
research purposes.
4. Generalizability. means whether the research data can be generalised or
applied to the whole population.

B. Data Collection
What is Data Collection: A Definition
Before we define what is data collection, it’s essential to ask the question, “What is
data?” The abridged answer is, data is various kinds of information formatted in a
particular way. Therefore, data collection is the process of gathering, measuring, and
analyzing accurate data from a variety of relevant sources to find answers to research
problems, answer questions, evaluate outcomes, and forecast trends and probabilities.
Our society is highly dependent on data, which underscores the importance of collecting
it. Accurate data collection is necessary to make informed business decisions, ensure
quality assurance, and keep research integrity. During data collection, the researchers
must identify the data types, the sources of data, and what methods are being used.
Additionally, we can break up data into qualitative and quantitative types.
Qualitative data covers descriptions such as color, size, quality, and appearance.
Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers,
percentages, etc.
1. Specific Data Collection Techniques
Let’s get into specifics. Using the primary/secondary methods mentioned above,
here is a breakdown of specific techniques.
a. Primary Data Collection
1) Interviews
The researcher asks questions of a large sampling of people, either by
direct interviews or means of mass communication such as by phone or mail.
This method is by far the most common means of data gathering.
2) Projective Data Gathering
Projective data gathering is an indirect interview, used when potential
respondents know why they're being asked questions and hesitate to answer.
For instance, someone may be reluctant to answer questions about their phone
service if a cell phone carrier representative poses the questions. With
projective data gathering, the interviewees get an incomplete question, and
they must fill in the rest, using their opinions, feelings, and attitudes.

3) Delphi Technique
The Oracle at Delphi, according to Greek mythology, was the high
priestess of Apollo’s temple, who gave advice, prophecies, and counsel. In the
realm of data collection, researchers use the Delphi technique by gathering
information from a panel of experts. Each expert answers questions in their
field of specialty, and the replies are consolidated into a single opinion.
4) Focus Group
Focus groups, like interviews, are a commonly used technique. The group
consists of anywhere from a half-dozen to a dozen people, led by a moderator,
brought together to discuss the issue.
5) Questionnaires
Questionnaires are a simple, straightforward data collection method.
Respondents get a series of questions, either open or close-ended, related to
the matter at hand.
b. Secondary Data Collection
Unlike primary data collection, there are no specific collection methods.
Instead, since the information has already been collected, the researcher consults
various data sources, such as:
1) Financial Statements
2) Sales Reports
3) Retailer/Distributor/Deal Feedback
4) Customer Personal Information (e.g., name, address, age, contact info)
5) Business Journals
6) Government Records (e.g., census, tax records, Social Security info)
7) Trade/Business Magazines
8) The internet
2. Data Collection Tools
Now that we’ve explained the various techniques, let’s narrow our focus even
further by looking at some specific tools. For example, we mentioned interviews as a
technique, but we can further break that down into different interview types (or
“tools”).
a. Word Association
The researcher gives the respondent a set of words and asks them what
comes to mind when they hear each word.
b. Sentence Completion
Researchers use sentence completion to understand what kind of ideas the
respondent has. This tool involves giving an incomplete sentence and seeing
how the interviewee finishes it.
c. Role-Playing
Respondents are presented with an imaginary situation and asked how
they would act or react if it was real.
d. In-Person Surveys
The researcher asks questions in person.
e. Online/Web Surveys
These surveys are easy to accomplish, but some users may be unwilling to
answer truthfully, if at all.
f. Mobile Surveys
These surveys take advantage of the increasing proliferation of mobile
technology. Mobile collection surveys rely on mobile devices like tablets or
smartphones to conduct surveys via SMS or mobile apps.
g. Phone Survey
No researcher can call thousands of people at once, so they need a third
party to handle the chore. However, many people have call screening and
won’t answer.
h. Observation
Sometimes, the simplest method is the best. Researchers who make direct
observations collect data quickly and easily, with little intrusion or third-party
bias. Naturally, it’s only effective in small-scale situations.

C. Validity in the Research Data on Language Learning


1. DEFINITION VALIDITY
Valid means correct. When we claim that the result of students writing assessment
is valid, we are convinced that the writing assessment result correctly reflects how
well the students can write. The correctness of the assessment is called validity and
the evidence to support the correctness of the assessment is called validity evidence.
Student English proficiency consists of several skills; reading, writing, speaking,
listening skills, and knowledge of language components; vocabulary and grammar.
When we want to assess the students writing skill, for example, we have to use
assessment instruments that can correctly reflect the students writing skills. If for
some reason, we choose the wrong assessment instrument, the result of the
assessment will not become valid representation of the students writing skill. When
we want to assess students speaking skill, but we use an assessment instrument that
gives paper and pencil task, the result of the assessment might not be the valid
representation of students speaking skills. A reading comprehension test may assess
more on students knowledge of vocabulary than on students reading comprehension
skills if the reading comprehension test consists of questions on the meaning of the
words used in the text. The scores from the reading comprehension test which only
asks the meaning of the words in the text may not be the valid representation of the
students reading comprehension skill. Or if it turns out that the actual performance of
the students does not match the scores from our assessment, students who get low
speaking scores from our speaking assessment perform better speaking ability in the
real communication than students who get high scores from our speaking assessment,
then the results of our speaking assessment has validity problem.
2. The place of Validity
Validity is not the characteristic of the assessment instruments used to collect the
data, it is attached to the results of the assessment. The question Is this writing
instrument is valid , is not quite appropriate since it is not the instrument that is valid.
An assessment instrument designed by even professional testing experts to measure
students knowledge of grammar can never be used to assess students writing skills, or
otherwise the result of the assessment can not be used as valid representation of
students writing skills. So no matter how good an assessment instrument is, it can
never become a valid instrument, since it always depends on what purpose the
instrument is used.
3. Predicting validity
Validity of the result of assessment is something abstract, but it can be predicted
based on the evidence that supports the validity. The stronger the supporting
validity evidence is, the more valid the result of the assessment is. The absence of
supporting validity evidence shows the validity problems with the result of the
assessment. Validity evidence can be provided from the assessment instrument
used and from empirical data. From the assessment instrument, construct validity
evidence and content validity evidence can be provided, while from empirical
data, concurrent validity evidence and predictive validity evidence can be
provided. In short there are four kinds of supporting validity evidence; construct,
content, concurrent, and predictive supporting validity evidence. The topic of the
discussion is not about how many kinds of validity, but it is about how many
kinds of supporting validity evidence
4. Construct Validity Evidence
Construct validity evidence comes from the assessment instrument used. An
assessment instrument is always designed to measure specific knowledge or skills of
a group of people. No single assessment instrument can be used to measure every
thing from everybody. So the development of an assessment instrument must always
be started from the defined construct of the knowledge or skills to be assessed. The
construct defined will lead to what tasks the instrument requires students to do. The
correct definition of the construct will lead to the correct selection of the task, which
will result in correct data, which has strong validity. If the task the students is
required to do (writing a text, for example), matches (is believed to be able to allow
the students to show) the writing skills to be assessed, then the match between the
task and the purpose of assessment becomes the construct validity evidence of the
data to be collected. The absence of the match between the task and the purpose of
assessment shows the construct validity problem of the data to be collected. Ebel and
Frisbie (1986:91) said that The meaning of the test scores is derived from the nature
of the tasks examinees are asked to perform .
If an assessment instrument used to measure students writing skills does not
require the students to perform a writing activity, the scores obtained will not become
valid representation of the students writing skill. These scores suffer from construct
validity problem. So to provide construct validity evidence that supports the validity
of the result of writing assessment, the writing assessment instrument must require
the students to perform writing activity.
If an assessment instrument used to measure students speaking ability does not
require the students to perform speaking activity, the scores obtained will not become
valid representation of students speaking skill and the scores suffer from construct
validity problem. But if the instrument requires the students to perform speaking
activity and based on the students speaking performance the scores are judged, then
the interpretation of the students speaking skill is valid as it is supported with strong
construct validity evidence.
It is important therefore for the tasks students are required to do to be stated very
clearly so that students whose skill is to be measured know well what they have to
perform.
5. Content Validity Evidence
In the construct validity, the task the students are required to perform becomes the
evidence. If the task the students are required to perform matches the purpose of the
assessment, then the match becomes the construct validity evidence. The absence of
the match between the task and the purpose of the assessment leads to the construct
validity problem to the result of assessment. In the content validity, the coverage of
the tasks becomes the evidence.
6. Concurrent Validity Evidence
Unlike evidence of construct and content validity, evidence of empirical validity
comes from another set of data resulted from another assessment for the same
purpose
(Heaton, 1988: 161). The term used for the empirical validity evidence is criterion-
related evidence for validity (Ebel & Friebie, 1986 : 95).
Students scores resulted from a classroom English proficiency test can be
compared to the scores the same students may get from TOEFL (Test of English as a
Fireign Language) as both aim at measuring English proficiencey. The scores resulted
from TOEFL are of course trustworthy because TOEFL has been internationally
standardized. When the two sets of English proficiency scores are co-related, then the
corelation shows the level of the validity of the scores from the classrom English
proficiency test. High and positive co-relation between the two sets of scores
becomes
strong evidence of concurrent validity for the scores from the classroom English
proficiency test. In the same way, low correlation between the two sets of scores
shows
that the scores from the classroom English proficiency test have concurrent validity
problems. High and positive co-relation between the two sets of scores means that
those
who get high scores from the classroom English proficiency test also get high scores
from
TOEFL, and similarly those who get lower scores from the classroom English
proficiency test get lower scores from TOEFL.
7. Predictive Validity Evidence
Very often an assessment is conducted to predict how well someone will perform
in a different time. An entrance test to universities, for example, is aimed at
predicting how well students can perform in universities. Only those who get high
scores from the entrance test are admitted to the universities because they are
predicted to perform well in their study at the university. While those who do not get
high scores from the entrance test are not admitted because they are predicted to
perform poorly in their study at the university. The scores among those who are
admitted can be ranked from the highest scorer to the lowest scorer. If the scores have
strong predictive power, then the higher the score the students have from the entrance
test, the better they are predicted to perform in their study.
D. Reliability of Research Data on Language Learning
While validity refers to the degree of correctness of the writing skill assessment
result in representing the writing skill being assessed (to what extent the result of a
language skill assessment result doesn t mistakenly represent another language skill, or to
what extent the result of speaking skill assessment result doesn t mistakenly represent the
knowledge of speaking), reliability of the result of language skill assessment refers to the
preciseness of the language skill assessment result in representing the actual level of the
skill of the examinees. The result of a language skill assessment has high reliability if the
result precisely represents (is very closed to, or is not too far away from, or gives good
estimate of, or does not overestimate or underestimate) the true level of the skill being
assessed. In other words, if the language skill assessment result is too far away different
from the true level of the skill being assessed, then the assessment result has low
reliability. The ditance between the true level of the skill and the assessment result, then,
determines the degree of reliability; the bigger the distance is between the language skill
assessment result and the actual level of the skill being assessed, the lower the reliability
of that assessment result is. The distance between the language skill assessment result and
the real level of the skill being assessed represents errors of the assessment result. In
other words, the bigger the errors in the assessment result are, the bigger the distance is
between the assessment result and the actual level of the skill being assessed, and the
lower the reliability of that assesment is.
1. FACTORS AFFECTING THE DEGREE OF RELIABILITY
The main factor affecting the validity of language skill assessment result is the
appropriateness of the procedure of the assessment (the appropriateness of the choice
of instrument). An assessment of speaking skill using a paper-and-pencil test that
requires the examinees to show their speaking skill by writing and based on the
writing the speaking skill is estimated, for example, will result in the speaking scores
with low
validity (weak construct-validity evidence), which means that the speaking scores
better represent the knowledge of speaking rather than the skill of speaking. If the
assessment of speaking skill is administered using an interview test, which requires
the examinees to show the skill of speaking by actually talking and based on the
talking activity the speaking skill is estimated, then the result of the speaking
assessment (or the scores) will have higher validity (with higher construct validity
evidence).
a. Not the Examinees Best Performance.
Errors in assessment that cause the scores to underestimate the true level
of the skill
being assessed may happen because the examinees are not in their best
condition when
the assessment is being administered due to the phyisical as well as emotional
constraints. They may be sick, tired, hungry, emotionally disturbed, not
concentrating, sleepy while the assessment is conducted. Doping or any
situation that makes the examinees overactive or exessively happy, on the
other hand, can also cause the assessment to result in scores which
overestimate the actual level of the skill being ssessed. To avoid the errors of
either underestimating or overestimating the true level of the skill being
assessed, therefore, the assessment should be conducted in such situation that
the constraints can be minimized. The assessor should select the best
conducive atmosohere to make sure that the examinees are not having those
constraints while the assessment is being administered.
b. Not the Raters Most Objective Judgment.
Like the errors coming from the examinees physical as well as emotional
constraints, errors in assessment that cause the scores to underestimate the true
level of the skill being assessed may happen because the raters who give the
judgment to the
quality of the skill being assessed are not in their most natural and objective
physical as well as emotional mode. They may be sick, tired, hungry,
emotionally disturbed, not
concentrating, sleepy while giving judgement. Doping or any situation that
makes the
raters over-active or exessively happy, on the other hand, can also cause the
judgement to result in scores which overestimate the actual level of the skill
being ssessed. To avoid the errors of either underestimating or overestimating
the true level of the skill being assessed, therefore, the judgement process
should be conducted in such situation that the constraints can be minimized.
The raters should select the best conducive atmosphere to make sure that they
are not having those constraints while giving the judgement.
c. The Assessment Instrument Being too Short.
An assessment for knowledge of English Grammar which asks 100
questions will
result in the assessment scores with higher reliability than the same
assessment which asks only 25 questions. Similarly, an interview to assess the
speaking skill which allows 30 minutes for each examinee to talk will result in
the scores for speaking skill with higher reliability than the same interview
which allows only 5 minutes for each examinee to talk. In short, an
assessment which asks more questions or which allows more time for the
examinees to show their performance will result in scores with higher
reliability than the same assessment which asks fewer questions and allows
shorter time for the examinees to show their performance. (Ebel & Frisbie,
1985: 84)..
d. The Assessment Instrument Content being Heterogenous
An assessment for written integrated English skills which covers the skill
of
Reading, the skill of Writing, the knowledge of Grammar, and the knowledge
of
Vocabulary will result in the scores with lower reliability than an assessment
for only one specific language skill or the knowledge of language component,
like an assessment for the skill of Reading, the skill of Writing, the knowledge
of Grammar, or the knowledge of Vocabulary. In other words, an instrument
with a number of questions designed to assess many different language skills
or knowledge of language components (heterogeneous) will result in the
scores with lower reliability than an instrument with the same number of
questions designed to assess one specific language skill or one specific
knowledge of language component (homogeneous). (Ebel & Frisbie, 1985:
84).
e. The Assessment Questions Being too Esay or too Difficult
An assessment for Listening Comprehension skill, for example, which
asks so
difficult questions that only 10 percent of the examinees can answer all the
questions
correctly, or so easy questions that almost all of the examinees can answer all
the
questions correctly will result in scores with lower reliability than the same
assessment which asks questions with moderate difficulty that about 35 to 85
percent of the
examinees can anwer all the questions correctly.(Ebel & Frisbie, 1985: 85).
So, the level of difficulty of the questions in the assessment instrument
influences the degree of reliability.
f. The Type and Quality of Assessment Instrument
An instrument designed to assess the knowledge of Vocabulary which
contains 100
multiple-choice-type questions could produce scores with the same degree of
reliability
as the same assessment instrument which contains 150 True-False-Type
questions. Or, a multiple-choice-type instrument with more plausible
distracters can produce scores with higher reliability than the same instrument
with less plausible distracters. So, the type of instrument and the quality of the
distractors influence the degree of reliability. (Ebel & Frisbie, 1985: 85)
g. Cheating in the Assessment
If the examinees are not strictly watched during the assessment process,
they might
copy each other s answers or they might copy their notes they have prepared.
If this
cheating happens then the assessment will result in low reliability scores So,
honesty of the examinees in answering the assessment questions affects the
degree of reliability.
h. Uncomfortable Place and Time of Assessment
An assessment conducted in an uncomfortable room; too hot, too cold, too
windy,
too crowded, too small, or too noisy, and in uncomfortable time; at 2.00 p.m.
after the
examinees have worked all morning, for example, will result in the
assessment scores
with low reliability.
2. ESTIMATING THE DEGREE OF RELIABILITY
Estimating the degree of reliability of scores or the results of language skill
assessment means verifying or confirming whether the scores or the results of the
assessment have high degree of preciseness. The evidence assumed to indicate high
degree of preciseness of scores or the results of language skill assessment is the
consistency of the scores. It is assumed that if the scores or the result of assessment
precisely represent the actual level of the skill being assessed or if the scores do not
contain too big mistakes (if the distance between the scores and the actual level of the
skill being assessed is not too big), then the scores or the result of the assessment will
be consistent. Conversely, if the scores or the results of the assessment do not
precisely
represent the actual level of the skill being assessed or if the scores contain too big
mistakes (if the distance between the scores and the actual level of the skill being
assessed is too big), then the scores or the result of assessment will not be consistent.
Estimating the degree of reliability, therefore, refers to an effort to collect evidence
of consistency to verify or to confirm the reliability. In other words, if the scores or
the
results of assessment are provided with evidence of high consistency, then the scores
or the results of the assessment convincingly have high degree of reliability/
preciseness.
For the scores or the results of conventional testing (from one single time /shot
testing),
the evidence of consistency is highly needed. This is because the single time /shot
testing potentially suffer from several problems (See factors affecting the degree of
reliability).
The evidence of consistency may be collected by computing the correlational
index
of the two sets of scores from two times testing (using test-retest method), the scores
from one test with the scores from its parallel form (using equivalent form method),
by
computing split-halve correlation, by computing internal consistency using Statistics
formula; KR-20 or KR-21 and by cross-checkiong inter raters agreement. (See
Djiwandono, 1996:99 and Ebel & Frisbie, 1986: 75 for more details).
For the results of alternative classroom assessment, which are relatively safer from
the factors affecting the degree of reliability, the evidence of high consistency is
already
provided through the process of the assessment. The result of assessment is not based
on one single time /shot assessment, it is instead based on several times assessment
throughout the whole process of instruction, from the beginning minutes to the last
minutes of instruction, from the first instruction to the last instruction in the quarter or
the semester.
E. Triangulation in research
What is triangulation
Triangulation is a method used to increase the credibility and validity of research
findings.1 Credibility
refers to trustworthiness and how believable a study is; validity is concerned with the
extent to which a study accurately reflects or evaluates the concept or ideas being
investigated.2 Triangulation, by combining theories, methods or observers in a
research study, can help ensure that fundamental biases arising from the use of a
single method or a single observer are overcome. Triangulation is also an effort to
help explore and explain complex human behaviour using a variety of methods to
offer a more balanced explanation to readers.2 It is a procedure that enables
validation of data and can be used in both quantitative and qualitative studies.
Triangulation can enrich research as it offers a variety of datasets to explain differing
aspects of a phenomenon of interest. It also helps refute where one dataset invalidates
a supposition generated by another. It can assist the confirming of a hypothesis where
one set of findings confirms another set. Finally, triangulation can help explain the
results of a study.3 Central to triangulation is the notion that methods leading to the
same results give more confidence in the research
findings.
a. Four types of triangulation are proposed by Denzin
data triangulation, which includes matters such as periods of time, space and
people
b. investigator triangulation, which includes the use of several
researchers in a study
c. theory triangulation, which encourages several theoretical schemes to enable
interpretation of a phenomenon
d. methodological triangulation, which promotes the use of several
data collection methods such as interviews and observations.

You might also like