Professional Documents
Culture Documents
SOLVING TEST
Faculty of Science
Universiti Teknologi Malaysia
JANUARY 2015
iii
DEDICATION
ACKNOWLEDGEMENT
In preparing this dissertation, I was in contact with many people. They have
contributed towards my understanding and thoughts. In particular, I would like to
express my truthful appreciation to my thesis supervisor, Dr. Norazlina for her
guidance and encouragement while doing this study.
Last but not least, I would like to thank all the lecturers and friends that have
guided me to complete the dissertation either directly and indirectly especially
Mariam who have given me invaluable assistance throughout my research work.
Thanks for their kindness and moral support.
Thank you.
v
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xi
LIST OF FIGURES xii
LIST OF ABBREVIATIONS xiii
LIST OF SYMBOLS xiv
LIST OF APPENDIX xv
1 INTRODUCTION 1
1.1 Introduction 1
1.2 Background of the Study 3
1.3 Problem Statement 4
1.4 Objectives of the Study 4
1.5 Significance of the Study 5
1.6 Scope of the Study 5
1.7 Definition of Terms 6
1.7.1 Latent Trait 6
1.7.2 Logit 6
1.7.3 Rating Scale Model 7
1.8 Outline of the Study 7
viii
2 LITERATURE REVIEW 8
2.1 Introduction 8
2.2 Rasch Model 8
2.2.1 Fit Statistics 11
2.2.2 Misfit 11
2.2.3 Person and Item Reliability 12
2.2.4 Person and Item Distribution Map 12
2.2.5 Internal Consistency 13
2.3 Critical Thinking 13
2.4 Problem Solving 14
2.5 Summary 15
3 RESEARCH METHODOLOGY 16
3.1 Introduction 16
3.2 Research Framework 16
3.3 Rasch Model Analysis 16
3.3.1 Identify the Reliability of the Instrument 17
3.3.1.1 Rasch Reliability 18
3.3.1.2 Internal Consistency 18
3.3.2 Identify the Validity of the Instrument 19
3.3.2.1 Infit and Outfit Mean Square 19
3.3.2.2 Standardized Fit Statistics 21
3.3.2.3 Point Measure Correlation 22
3.3.2.4 Person and Item Separation 22
3.3.3 Identify the Person Performance and Item 23
Difficulties of the Instrument
3.3.4 Identify the Misfit Item in the Instrument 24
3.4 Descriptive Summary 24
3.4.1 Respondents of the Study 25
3.4.2 Mode 25
3.5 Research Instrument 25
3.6 Summary 31
ix
5 DATA ANALYSIS 48
5.1 Introduction 48
5.2 Respondents’ Demographic 48
5.2.1 Gender Distribution 48
5.2.2 Race Distribution 49
5.2.3 Faculty Distribution 49
5.3 Critical Thinking Problem Solving Level 51
5.4 Summary 53
REFERENCES 57
Appendix A 61 - 63
xi
LIST OF TABLES
LIST OF FIGURES
LIST OF ABBREVIATIONS
AA - Aspect of Assessment
CTPS - Critical Thinking Problem Solving
CTPST - Critical Thinking Problem Solving Test
FAB - Faculty of Built Environment
FBME - Faculty of Biosciences and Medical Engineering
FC - Faculty of Computing
FChE - Faculty of Chemical Engineering
FGHT - Faculty of Geoinformation and Real Estate
FKA - Faculty of Civil Engineering
FKE - Faculty of Electrical Engineering
FM - Faculty of Management
FPREE - Faculty of Petroleum and Renewable Energy Engineering
FS - Faculty of Science
MJIIT - Malaysian- Japan International Institute of Technology
MNSQ - Mean Square
OMNSQ - Outfit Mean Square
PIDM - Person and Item Distribution Map
PMC - Point Measure Correlation
RS - Razak School of Engineering and Technology
SA - Adjusted Standard Deviation
SE - Average Measurement Error
SPSS - Statistical Package for the Social Science
UTM - Universiti Teknologi Malaysia
WASI - Whimbey Analytical Skills Inventory
WGCTA - Watson Glaser Critical Thinking Appraisal
ZSTD - Outfit Z-Standard
xiv
LIST OF SYMBOLS
α - Cronbach’s Alpha
μ - Mean
% - Percentage
Q - Question
xv
LIST OF APPENDIX
INTRODUCTION
1.1 Introduction
Physical traits, such as height, the process of assigning numbers can be done
directly using a ruler. However, psychological traits such as ability or proficiency are
constructs. They are unobservable but can be measured indirectly through a test by
using a tool (Khairani and Razak, 2012). Therefore, for the test that relate to
observable traits (such as test score) with unobservable traits (such as ability or
proficiency) researchers apply Rasch model.
good decisions. However, do these students have critical thinking skills and the
abilities to apply those skills in many different contexts? Do deans or program
directors at colleges and universities can ensure that graduate students are able to
think critically in complex situations?
Rasch (1960) cited in Othman et al. (2011) also declared that Rasch model is
one of the reliable and suitable way in assessing student‟ ability. Ghulman and
Mas'odi (2009) declared that Rasch measurement is beneficial with its predictive
feature to overcome the missing data.
Study done by Saidfudin et al. (2010) proved that Rasch model can
categorize grades into learning outcomes more accurately especially in dealing with
small number of sampling units. Aziz et al. (2008) also applied Rasch model to
4
Rasch model agrees the generalizability across samples and items, allows for
testing of unidimensionality, produces an ordered set of items, and identifies poorly
functioning items as well as unexpected responses. In this study, solving problems
involving critical thinking skills is evaluated. Due to the problems, the study is
proposed to determine the effectiveness of Critical Thinking Problem Solving Test
(CTPST) in developing this ability and the level of critical thinking problem solving
abilities based on faculties.
In the view of the above stated requirements and problems, the present
research aims at the following main objectives:
(ii) To identify the critical thinking level in solving problem for each
faculty through Winsteps 3.81 and Statistical Package for Social
Science (SPSS) version 16.0.
This study focuses in developing the reliability and validity of the questions
and students‟ performance. Computer software, Winsteps will be able to solve large
sample size of respondents and items with less computational effort. The main
contributions of the research are summarized as follows:
(i) Analyze the reliability and validity of the problems using Winsteps.
In this study, routine and non-routine problems are taken into account as an
assessment tool. The respondents will be the first year undergraduate students from
selected faculties in UTM. There are a total of 981 students where 441 of them are
male respondents and 540 of them are female respondents. In the study, the sample is
chosen randomly to gain more accurate results.
The instrument for this study is Critical Thinking Problem Solving Test
(CTPST). Data collected will be performed from the output of Winsteps software
version 3.81.0 which will be used to interpret the validity and reliability of the
CTPST in term of person and item separation respectively, misfit item and
unidimensionality. In addition, Statistical Package for the Social Sciences (SPSS)
version 16.0 will be used to determine the critical thinking level for each faculty.
6
In this study, there are a few terms being used that are related to Rasch model.
They are being defined as below:
This term refers to certain human attributes that are not directly measurable.
In the theory of latent model, a person‟s performance can be quantified and the
values are used to interpret and explain the person‟s test response behavior.
Frequently, trait and ability are used interchangeably in the literature. (Andrich, 1978)
1.7.2 Logit
Logarithm of odds, logit is the unit of measurement when the Rasch model is
used to transform raw scores obtained from ordinal data to log odds ratios on a
common interval scale.
( ) (1.1)
A logit has the same characteristics of an interval scale in that the unit of
measurement maintains equal differences between values regardless of location. The
value of 0.0 logit is routinely allocated to the mean of the item difficulty estimates
(Bond and Fox, 2001).
7
It is one of the Rasch family models developed by Andrich (1978). The rating
scale model can be applied to polytomous data obtained from ordinal scales or Likert
scales. In the item response theory framework, the rating scale model is categorized
as a one-parameter logistic model.
This thesis contains six chapters. Next, Chapter 2 provides information about
the model used in carried out the study namely Rasch model. This chapter also shares
the literature review on critical thinking and problems solving. Then, Chapter 3
presents the research methodology adopted in carrying out the work. The chapter
explains the descriptive statistics used in Rasch analysis. Chapter 4 presents the
framework on validation of the items, faculties‟ achievement and discussion on the
result obtained from Winsteps version 3.81.0. Chapter 5 is about the discussion on
the performance of each faculty towards CTPST from SPSS version 16.0 outputs.
Finally, the last chapter gives the conclusion of the study and the recommendations
for future work.
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
dependent on which instrument is used to measure his or her trait (Khairani and
Razak, 2012). However, this shortcoming is avoided by procedure of conjoint
measurement in Rasch model. It has been explained that in conjoint measurement,
the unit of measurement is not the examinee or the item, but rather the performance
of an examinee regarding to a particular item.
Rasch model is a new measurement method that uses data from the students‟
assessment and transforms it into „logit‟ scale thus transform the assessment outcome
into a linear correlation with equal interval (Osman et al., 2012). In Rasch, it
produced a reliable repeatable measurement instrument instead of establishing the
„best fit line‟ (Aziz et al., 2008).
In the case of the Rasch model for a dichotomous item where there are only
two response categories, the mathematical function of the item characteristic curve is
given by equation (2.1) (Rasch, 1960).
( )
[ ( )] ( ) (2.1)
( )
[ ( )] [ ( ) ] (2.2)
10
[ ( )] ( )
(2.3)
In order for the Rasch Model measurement to have the “examinee-free” item
difficulty and “item-free” examinee ability measurement, two important assumptions
must be met. Firstly, the data must meet the unidimensionality assumption, that is,
they represent a single construct. All the items forming the questionnaire measure
only under the latent trait of the study. Secondly, Rasch model requires that the data
must fit the model (Khairani and Razak, 2012), that is local independence (the
response to a given item is independent from the responses to the other items in the
questionnaire).
From the study of Ghulman and Mas'odi (2009) and through the Rasch model
and Bloom's Taxonomy learning domain, they found out the reason for behavioral
change occurred based on the Bloom‟s Taxonomy. There were two findings shown;
first is whether the learning process is in risk or secondly the teaching process needs
11
to be revised. Rasch has made it very useful with its predictive feature to overcome
missing data.
Khairani and Razak (2012) said that there was more evidence in favor of the
Rasch model as having the capacity to resolve some of the elementary issues in
measurement. Nevertheless, in order to hold the construction of validity, the model
requires more evidence especially the corresponding between theoretical perspective
and the observable behaviors. Test developers would need to have a thorough
understanding of the measured construct especially information on relative
difficulties of the items so that they can conceptualize the measured construct.
2.2.2 Misfit
Ghulman and Mas'odi (2009), Othman et al. (2011) and Osman et al. (2012)
stated that there will be simple comparison procedure within three-step, to figure out
which item does not fit the Rasch model. They are point measure correlation (PMC),
outfit mean square (OMNSQ) and outfit z-standard (ZSTD).
Nopiah et al. (2012) stated that if there is a small correlation value means
many students could not answer the question. If the value of OMNSQ > 1.5 and
12
ZSTD > 2 shows that the inability of poor student to answer difficult question. If
OMNSQ < 0.5 and ZSTQ < -2 shows that poor student cannot answer easy question.
Khairani and Razak (2012) claimed that if the reliability of item difficulty
measures were high (0.99) mean the ordering of item difficulty was replicable with
other comparable sample of examinee. Meanwhile, consistency of examinees‟
measures which is equivalent to Cronbach‟s alpha was also high (0.90) implies that it
was highly likely that the ordering of examinees proficiency can be replicated.
Othman et al. (2011) shared that from PIDM scale, it can indicate the most
difficult item and the most able test takers. It can also identify the redundancies on
13
the item measured so that researchers can decide whether changing towards the
instrument is needed.
Nopiah et al. (2012) in the research indicated that higher ranking in PIDM
means that the item was more difficult. The orthogonal arrow in the map shows the
gap between the two items. The wider the gap, the more difficulty the students
encountered when attempting to answer the question.
All the researchers concluded that PIDM will give a clear view on the
relationship between persons‟ performance (higher ranking means he is the best
performer) and item difficulty (higher ranking means hardest item).
In the study, Osman et al. (2012) showed that the internal consistency is
being determined through Cronbach‟s alpha, α. If α = 0.66 which is slightly higher
than the acceptable level α = 0.6, then the model is acceptable.
Saidfudin et al. (2010) also stated that in normal statistical analysis if the
value of α which is disturbingly low such as α = 0.33 the test of evaluation have to be
ignored as it is below the acceptable level 0.6.
rejected. This is because the Pearson product moment correlation for the score of the
WGCTA with the score of the WASI produced correlation coefficients, r values of
0.65 and above which imply a strong positive relationship between the scores of the
WGCTA and the WASI.
On the other hand, Australia, England and the United States, the best students
in mathematics also have excellent problem-solving skills. These countries‟ good
performance in problem solving was mainly due to strong performers in mathematics.
15
This may suggest that in these countries, top performers in mathematics have access
to improve their problem-solving skills (Bortoli and Macaskill, 2014).
2.5 Summary
RESEARCH METHODOLOGY
3.1 Introduction
This chapter provides research methodology where the research activities that
will be carried out towards achieving the objectives of this research are presented.
Figure 3.1 shows the flow chart of the research framework of the study.
In this study, Rasch model analysis is being carried out to identify the
reliability, person performance and item difficulties of the instrument, the misfit item
in the instrument and validity of the instrument. Therefore, Appendix A describes
how to use the software.
17
There are two ways to identify the reliability of this study. They are Rasch
reliability and internal consistency.
18
For item and person reliabilities, a value close to 1.0 is considered good
reliability because the value indicates the percentage of observed response variance
that is reproducible. Table 3.1 displays the person and item reliability level by
Sumintono and Widhiarso (2013).
There are few perspectives in Rasch model that use to identify the validity of
the instrument which are being discussed as below:
The Rasch model provides two forms of fit statistics: infit and outfit mean
square. Mean square, MNSQ fit statistics show the size of the randomness that is the
amount of distortion of the measurement system. The expected value is 1.0. Values
less than 1.0 indicate observations are too predictable (redundancy, data overfit the
model). Values greater than 1.0 indicate unpredictability (un-modeled noise, data
under fit the model).
As stated by Linacre (2002), mean square fit statistics in Rasch analysis have
been defined such that the model-specified uniform value of randomness is indicated
by 1.0. The value above 1.5 indicates more than 50% unexplained randomness. If the
values greater than 2.0 suggest that there is more unexplained noise than explained
noise, so indicating there is more misinformation than information in the
observations. Nevertheless large mean-squares do indicate that the segments of the
data may not support useful measurement.
20
∑[ ]
(3.1)
On the other hands, outfit mean square means outlier-sensitive fit. It is more
sensitive to responses to items with difficulty far from a person, and vice-versa. This
is based on the conventional chi-square statistic. A chi-square statistic is the sum of
squares of standard normal variables. For ease of interpretation, this chi-square is
divided by its degrees of freedom to have a mean-square form and reported as
“Outfit” as presented in equation (3.2).
∑[ ]
(3.2)
Point measure correlation is the correlation between the observations and the
Rasch measures as in equation (3.3). The range of the correlation is -1 to +1. Hence,
the accepted value for point measure correlation is 0.4 < PMC < 0.8 (Othman et al.,
2011 and Nopiah et al., 2012).
∑ ( ∑ )( ∑ )
(3.3)
√∑ ( ∑ ) ∑ ( ∑ )
where X1,..,XN are the responses by the persons (or on the items), and Y1,..,YN are the
person measures (item easiness = - item difficulties).
Rasch model provides two useful indices describing the separation of items
on a variable and the separation of persons on a scale, respectively. Separation
coefficient, S is the ratio of the person (or item) the adjusted standard deviation, SA
divided by the average measurement error, SE which is known as standard deviation
of the error. Based on Nopiah et al. (2012), if the separation value is low means less
variability of person on the trait and vice versa.
Person separation, SP is used to classify people and estimate of how well the
scale identifies individual differences. It is being calculated from equation (3.4).
23
With a relevant person sample, if there is low person separation that is SP < 2,
implies that the instrument may not be sensitive enough to distinguish between high
and low performers. In short, more items may be needed.
(3.4)
Item separation, SI is used to verify the item hierarchy and estimate of how
well the scale separates test items. It is being determined from equation (3.5) (Chung,
2005). Low item separation, which is SI < 3, indicates that the person sample is not
large enough to confirm the item difficulty hierarchy. Therefore, to construct the
validity of the instrument more respondents may be needed.
(3.5)
3.3.3 Identify the Person Performance and Item Difficulties of the Instrument
The higher the location of item from the μI, the more difficult the item
compared to an item on a lower location. Similarly, for person distribution, the
excellent students were located at top of the map while the poor students were
located at the bottom of the map. Therefore, the level of a person‟s ability can be
identified from PIDM by looking at the separation between the person and item on
the map. The bigger the separation, the more able a person is likely to achieve the
item.
24
The total participants in this study were 981 first year undergraduate students
in Universiti Teknologi Malaysia (UTM) from 12 faculties and schools. They are
from Faculty of Built Environment (FAB), Faculty of Biosciences and Medical
Engineering (FBME), Faculty of Civil Engineering (FKA), Faculty of Computing
(FC), Faculty of Electrical Engineering (FKE), Faculty of Chemical Engineering
(FChE), Faculty of Geoinformation and Real Estate (FGHT), Faculty of
Management (FM), Faculty of Science (FS), Faculty of Petroleum and Renewable
Energy Engineering (FPREE), Razak School of Engineering and Technology (RS)
and Malaysian- Japan International Institute of Technology (MJIIT).
3.4.2 Mode
The test consists of two parts which are Part A and Part B where students are
required to answers all questions which only involve basic calculation and logical
thinking. There are 23 questions, including the sub-questions. This instrument was
26
assumed fit to measure the critical thinking ability of students. The students were
given one hour to answer all the questions.
In this study, there are four critical thinking skills to be evaluated. They are
ability to define and analyse problems in complex, overlapping, ill-defined domains
and make well-supported judgment (CTPS 1), ability to apply and improve on
thinking skills, especially skills in reasoning, analyzing and evaluating (CTPS 2),
ability to look for alternative ideas and solutions (CTPS 3) and ability to „think
outside the box‟ (CTPS 4). Each of the CTPS skills does have its own criteria as
shown in Table 3.7. For example, CTPS 1_1_1 means to evaluate students‟ ability to
state and define the problem, CTPS 2_2_1 evaluate students‟ ability to state the
„how/when/where/what‟ and so on.
Aspects of
Descriptors Performance criteria
Assessment
CTPS2-AA1
Ability to apply
Reasoning/ 1. Ability to state the „why‟
and improve on
C Rationalizing
T thinking skills,
CTPS2-AA2 1. Ability to state the
P especially skills in
S Analyzing „how/when/where/what‟
reasoning,
1. Ability to re-conciliate the
2 analyzing and CTPS2-AA3
whole information
evaluating Evaluating
2. Ability to make judgment
Criteria / Score 4 3 2 1
Ability to Unable to
Produced Produced Produced
produce produce
2 outstanding acceptable insufficient
alternative alternative
ideas ideas ideas
ideas ideas
Aspect of Assessment: CTPS3-AA3 Innovative
Criteria / Score 4 3 2 1
Ability to Adequate Unable to
High Barely able to
consider consideration consider
consideration consider
alternative to alternative alternative
to alternative alternative
1 course of course of course of
course of course of
action/ action/ action/
action/ policy action/ policy
policy policy policy
options options
options options options
Highly Capable in Barely
Ability to Unable to
capable in applying capable in
apply new apply new
applying new new applying new
2 approaches approaches
approaches approach to approaches to
to problem to problem
to problem problem problem
solving solving
solving solving solving
In this study, mode will be used to identify, in order to describe the central
position of the variable attached with tabulated description and graphical description
by using tables and charts respectively.
3.6 Summary
4.1 Introduction
In this chapter, all data collected was analyzed by using Winsteps version
3.81.0. A full discussion is done on the validation of Critical Thinking Problem
Solving Test (CTPST) and the results between person and items in total and each of
the CTPS.
In Rasch analysis, there are two types of summary statistics. They are person
measure and item measure. Therefore, the results from Rasch analysis are explained
in details as follows.
Person measure gives summary on the sample of the study. Figure 4.1
illustrates summary of 981 measured people. A fair person spread of 3.21 logit
(spread between maximum measured persons 0.95 to minimum measured person -
2.26) with person separation, SP = 1.60 and good internal consistency of examinees‟
33
measures (equivalent to Cronbach‟s alpha) was also high that is, 0.73 which is above
the acceptable level 0.60. This shows that more items are needed to distinguish high
and low performers and there are inter-correlations among CTPST items respectively.
As the value of separation and person reliability is high, this shows more
variability of person on the trait and responses to the statements in the questionnaire.
Similarly, it shows that greater consistency with higher reliability coefficient for the
data respectively (Kasim and Annuar, 2011). The major finding is the person mean,
μP = -0.26 logit which is lower than the value of item mean, μI = 0. These values
show that the students were found to be below the expected performance in
answering the questions although the solution in CTPST was designed with basic
logical interpretations and reasonable decision to measure the undergraduate students‟
critical thinking level without applying any specific mathematical models.
Razak, 2012). Therefore, the items in CTPST are suitable to be applied to any first
year undergraduate students regardless of any specific faculty.
The item separation, SI = 13.85 indicates that there are 14 groups classifiable
from the question which are CTPS 1_1_1 (ability to state and define the problem),
CTPS 1_1_2 (ability to identify related concepts), CTPS 1_1_3 (ability to make
correct assumptions), CTPS 1_2_2 (ability to compare and contrast), CTPS 2_1_1
(ability to state the „why‟), CTPS 2_2_1 (ability to state the „how/when/where/what‟),
CTPS 2_3_1 (ability to re-conciliate the whole information), CTPS 2_3_2 (ability to
make judgment), CTPS 3_1_2 (ability to contrast different ideas), CTPS 3_2_2
(ability to produce alternative ideas), CTPS 3_3_1 (ability to consider alternative
course of action), CTPS 3_3_2 (ability to apply new approaches to problem solving),
CTPS 3_3_3 (ability to realize the solution and put into practice) and CTPS 4_3_1
(ability to contrast from mainstream ideas towards problem solving).
By referring to Figure 4.1 and 4.2, the zero point on the Rasch scale does not
represent zero critical thinking level. It is an artificial point representing the mean of
the item difficulties, calibrated by default to be zero, in Rasch measurement as
displayed in PIDM.
35
The person and item distribution map (PIDM) shows a better picture on how
the student correlates to the respective questions as the items and the students were
located along the proficiency scale. It can give a clearer view of the person‟s ability
and relevant item difficulty. A higher ranking indicates that the items are more
difficult and the students at the top display higher ability. Going down, the items
become easier and the students display less ability. The orthogonal arrow ( ) shows
the gap between the two items. The wider the gap, the more difficulty the students
encountered when attempting to answer the question.
By comparing each CTPS, from Figure 4.3, there are a total of nine items
designed to evaluate skills involve CTPS 1 with the item difficulties ranged from -
0.99 to 0.62. By referring to Figure 4.4, Q1_CTPS 1_1_2 was the most difficult
question while Q8_CTPS 1_1_2 was the easiest although it evaluated same skill that
is students‟ ability to identify related concepts.
Most of the students cannot score full mark for Q1_CTPS 1_1_2 because
they cannot explain on their answers due to the lack of understanding on the question.
However, there are students who can answer all the items correctly because they
have higher logit value compared to Q1_CTPS 1_1_2. Conversely, there are students
who are unable to answer the easiest question (Q8_CTPS 1_1_2) correctly. This is
because the students cannot interpret the relationship between performance and
revision hours from the graph provided in the questionnaire.
36
For CTPS 2, item difficulties ranged from -0.47 to 0.59 (Figure 4.5) among
eight items. Q6_CTPS 2_3_1 (ability to re-conciliate the whole information) was
hard to answer and Q10_CTPS 2_1_1 (ability to state the „why‟) was easier to
interpret by students as illustrated in Figure 4.6.
37
Question 6 with CTPS 2_3_1 has been categorized as the hardest item in
identifies CTPS 2. This might be due to the students cannot merge the information
between the situation given and explanation on their answer as it is to evaluate
students‟ ability to re-conciliate the whole information.
For CTPS 3, item difficulties ranged from -0.52 to 0.52 (Figure 4.7) among
five items. As displayed in Figure 4.8, item 2 (Q4_CTPS 3_1_2) which evaluate
students‟ ability to contrast different ideas was the easiest question while item 4
(Q5_CTPS 3_2_2) which test on students‟ ability to strategize method of solution
was difficult item.
Although question 4 with CTPS 3_1_2 is the easiest question, there are many
students who are unable to answer it correctly. Therefore, it can be concluded that
students are unable to look for alternative ideas. From question 5 with CTPS 3_2_2,
it can be concluded that most of the university students still unable to make a strategy
from a given situation.
38
For CTPS 4, there is only one item being evaluated, therefore, no item
difficulties can be ranged. Therefore, more item needs to be added in order to
measure skills CTPS 4.
40
Figure 4.9 shows students with the lowest score (-2.26 logit by referring
Figure 4.1) that can be categorized as students with the poorest ability.
Similarly to PIDM, person and item histogram as in Figure 4.10 also can
estimate person and item locations on a single scale. The smaller the proportion of
correct responses, the higher the difficulty of an item hence the higher the item's
41
scale location. From the histogram, there are 4 bar charts for item means it can be
classified into four categories. Therefore, the questions can be categorized into four
groups which are difficult, moderate, easy and very easy question by deciding
through rule of thumb (Sumintono and Widhiarso, 2013). Once the item locations are
scaled, the person locations are measured on the scale.
Figure 4.9 also shows that question 6 with CTPS 2_3_1 (ability to re-
conciliate the information) is the hardest question and question 8 with CTPS 1_1_2
(ability to identify related concepts) is the easiest question in the CTPST.
Question 10 with CTPS 1_1_2 (ability to identify related concepts) has the
largest gap which indicates that students faced more difficulty when attempting to
answer the question. This implies that students are unable to draw a conclusion or
relationship from a give situation. But, question 10 with CTPS 2_3_1 (ability to re-
conciliate the information) which require students to make a conclusion based on the
graph, most of the students can answer correctly. It implies students are still able to
think critically.
From Figure 4.9 and 4.10, it can be concluded most of the students can
answer the “moderate” level questions. Also, very few numbers of students can
answer correctly for the hardest question, Q6 CTPS 2_3_1 as well as the easiest
question, Q8 CTPS 1_1_2. This might be due to the students providing the wrong
explanation in their answer and their misunderstanding on the question respectively.
4.4 Misfit
Rasch analysis helps to identify the item that is not suitable to be included in
the instrument, misfit item. Total score or raw score is the score of the total number
of respondents who obtained correct for the corresponding item. The total count tells
us that 981 students responded to the items. Measure is the logit position of the item,
the biggest the value indicates more difficult the item is. As stated in Chapter 3, to
identify the misfit item, controls was applied to check item acceptability with the
0.40 < PMC < 0.85, 0.50 < OMNSQ < 1.50, and -2.0 < ZSTD < 2.0.
The validity of the question can be determined based on the analysis of the
point measure correlation. From Figure 4.11, there is small correlation (0.21) on item
Q1_CTPS 2_1_1 (ability to state the „why‟) that shows many of the students could
not answer the question and only a few students can answer the question.
The figure also shows that item Q8_CTPS 1_1_2 (ability to identify related
concepts) needs review. It seems that it meets the discrimination criteria of a quality
question with PMC = 0.34 < 0.40, OMNSQ = 1.32 < 1.5 and ZSTD = 3.4 > 2; but it
is not considered as misfit item as not all the criterias fall outside the range.
43
Figure 4.11 also presents 23 items and these were sorted in the descending
order with respect to a “Measure” column. Only one item (Q2_CTPS 2_3_2 with
PMC = 0.24, OMNSQ = 1.53 and ZSTD = 9.9) was found to have fallen outside the
acceptable regions. Further analysis on these three misfit items should be taken as
part of enhancing the instrument. Two actions might be considered such as
rephrasing or deleting the item.
The scalogram also illustrates that students (person) 34, 83, 744 and 927 fails
to score question Q2_CTPS 2_3_2 is regarding ability to make judgment, even
though they are top excellence students. Hence, it is obvious that first year
undergraduate students yet to have critical thinking skills.
Based on the scalogram, top 100 students are chosen to identify the faculty‟s
performance as shown in Figure 4.13. From the figure, it implies that FKE shows the
best performance with 17 students listed in it. On the other hand, poor performance
has been shown by FGHT and RS whereby no students score within top 100. Hence,
more critical thinking problem solving assessments need to be done towards the
students. It also illustrates that most of the engineering faculties, FBME, FChE,
FPREE, FKA and FKE students is in the top 100. Individually, highest performer is
from FKA.
44
In conclusion, as depicted in Figure 4.9, the PMC value ranged from 0.21 to
0.57, with no item containing zero or negative values. This correlation indicated that
all items were working together in the same way in defining the critical thinking
solving problem test items. The means of the infit and outfit MNSQ of 0.99 and 1.00,
respectively, were close to the value expected by the model, 1.00. This suggests that
45
the amount of distortion of the measurement was minimal. Therefore, the CTPST is
suitable to be used in determining students‟ critical thinking level in solving
problems.
4.5 Unidimensionality
Investigation of dimensionality was carried out to ensure that the CTPST was
measuring only a single construct; the CTPS construct no other skills. Raw variance
explained by measures shows 31.1% compare to expected model, Rasch model
which is 31.0% (Figure 4.14). According to Sumintono andWidhiarso (2013), the
instrument is unidimensional with its raw variance explained by measures at least
20%. Nonetheless, in this study, the unexplained variance in 1st contrast is 8.1% but
it is still accepted as it is far from the maximum level that is 15%. It may be known
as the “noise”, the questions that influence students‟ understanding while trying to
answer.
Figure 4.15 displays that there are three items with standard residual
correlation greater or equal to 0.70. Therefore, they are considered as the “noise” in
the instrument which is item 8 (Q4_CTPS 3_3_3) which is to evaluate students‟
ability to realize the solution and put into practice by arranging nine sticks to form
five equilateral triangle, 10 (Q5_CTPS 3_2_2) which is to evaluate students‟ ability
to strategize method of solution by forming four smaller pieces of equal size and
shape from a given paper and 12 (Q6_CTPS 2_3_1) which is to evaluate students‟
ability to re-conciliate the whole information by providing correct explanation. Thus,
these items may need to be rephrasing to ensure more accurate results can be
obtained. However, it can be accepted as the “noise” is below the maximum level of
15%. Therefore, it can be concluded that the CTPST was acceptable in measuring the
critical thinking level students in solving problems.
46
4.6 Summary
The PIDM and scalogram shows that majority of the students are able to
answer the questions within moderate level. The FKE students have the highest
achievement with 17 students being listed in top 100. There were some top students
who cannot answer moderate question where the weakest student was found to have
the ability level below the minimum of item. In this study, most of the students were
unable to score full point in giving explanation.
47
DATA ANALYSIS
5.1 Introduction
In Chapter 4, from the Rasch analysis, the Critical Thinking Problem Solving
Test (CTPST) had been validated. Therefore, in this chapter, discussion on the
demographic data such as gender and faculties of the participants will be done. The
result of the data analysis from 981 respondents will also be presented. All data
collected was analyzed by using Statistical Package for the Social Sciences (SPSS)
version 16.0 for Windows. The data analyzed by using descriptive method and the
results will be shown in tables and figures.
Table 5.1 below presents the students‟ gender distribution. There are 441 of
male students and 540 of female respondents participated in this study.
49
Based on Figure 5.1, 71% of the respondents were Malay students, 165 of the
respondents were Chinese students followed by, 5% and 8% of respondents were
India and other races students respectively.
Race Distribution
5%
8%
Malay
16% Chinese
India
71%
Others
The sample size of this study is 981 of respondents whom are first year
undergraduate students from 12 faculties and schools in Universiti Teknologi
Malaysia (UTM) as illustrated in Figure 5.2 and Table 5.2. They are Faculty of Built
Environment (FAB), Faculty of Biosciences and Medical Engineering (FBME),
Faculty of Civil Engineering (FKA), Faculty of Computing (FC), Faculty of
50
By referring Table 5.2 and Figure 5.2, at least 80 students from the
engineering faculties are involved in the study. Firstly, FBME with 124 students
(13%) followed by FChE 107 students (11%). RS shown the least students took part
in the CTPST with only 25 students (3%) out of 981 students.
Also more than half of the selected faculties sent less than 10% of
respondents from its faculty. They are FAB, FC, FGHT, FM, FPREE, FS, MJIIT and
RS. The cooperation with faculties‟ office is important as to ensure all the first year
undergraduate students are taking part as it can help to evaluate the critical thinking
level of students.
The CTPS questionnaires were marked based on the CTPS. There are four
levels of rating scale with one as the lowest score and four as the highest score.
Students‟ answers are recorded in SPSS by calculating the mode of the CTPS rating
scale to identify the students‟ critical thinking level in solving problems.
FKE students have the highest critical thinking level in solving problems
where it has the highest number of students obtain score four. Figure 5.3 displayed a
clear view on the comparison between the frequency of the score and faculties.
Faculty
In contrast, FGHT has the highest numbers of students in obtaining the lowest
score, where 21 items are answered in score one and only one item was being answer
correctly by the students. In other words, FGHT shows the lowest performance in
CTPST among the faculties.
In Table 5.3, most of the students obtain score one followed by score four.
This is because some of the participants did not solve the problems which may due to
lack of time provided or they did not know the method to solve cause of students‟
understanding and logical thinking towards the questions for example problem 5 with
CTPS 3_2_2 (ability to strategize method of solution).
MJIIT 13 1 1 8
RS 17 1 0 5
TOTAL 182 19 12 73
5.4 Summary
From the results, it can be interpreted that critical thinking level of the
students‟ in solving problems needs to be enhanced during their four years of study
so that they are willing to face obstacles after graduated and applied knowledge and
skills that had been delivered by the lecturers.
CHAPTER 6
6.1 Introduction
This chapter begins with the conclusion of the study that has been drawn
based on the computational experiment that was carried out. Besides the general
conclusion of the data, suggestions for future studies will also be discussed in order
to improve the work are given.
6.2 Conclusions
This study was carried out to determine the validation of CTPST and students‟
critical thinking level among first year undergraduate student in UTM. The main
conclusions of this study are as follows:
(i) The items are suitable to all first year undergraduate students as it
only involve non-routine questions that capture CTPS skills and do
not follow any specific mathematical problems with high reliability
and validity.
55
(ii) FKE have the highest achievement in CTPST. However, the overall
achievement shows that the students have low critical thinking skills
in solving problems.
(iii) The items in CTPST are unidimensional. It only measure the critical
thinking level of the respondents.
(v) More items are needed to enable researcher to identify between high
and low performers.
The findings of this research suggest the need of each faculty in building
critical thinking skills of the students by making curricular changes in an attempt to
improve students‟ critical thinking skills.
6.3 Recommendations
After completing this report, these are some suggestions for future studies
that can be considered. The recommendations for further research include:
Aziz, A. A., Mohamed, A., Arshad, N. H., Zakaria, S., Ghulman, H. A. & Masodi, M.
S. (2008). Development of Rasch-based Descriptive Scale in profiling
Information Professionals' Competency. Proceedings of International
Symposium on Information Technology, 2008 (ITSim 2008). 26-28 Aug. 1-8.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
Kasim, R. S. R. & Annuar, A. (2011). Cognitive styles: Web portal acceptance items
measurement. Proceedings of 2011 IEEE International Conference on
Computer Applications and Industrial Electronics (ICCAIE). 4-7 Dec. 2011,
427-431.
Knutson, N., Akers, K. S. & Bradley, K. D. (2010). Applying the Rasch Model to
Measure First-Year Students’ Perceptions of College Academic Readiness.13.
Mourtos, N. J., Okamoto, N. D. & Rhee, J. (2004). Defining, teaching, and assessing
problem solving skills. Proceedings of 7th UICEE Annual Conference on
Engineering Education. Mumbai, India. 1-5.
59
Nopiah, Z. M., Rosli, S., Baharin, M. N., Othman, H. & Ismail, A. (2012).
Evaluation of pre-assessment method on improving student's performance in
complex analysis course. Asian Social Science, 8(16), 134-139.
OECD. (2014). PISA 2012 Results: Creative Problem Solving: Students' Skills in
Tackling Real-Life Problems (Vol. V, pp. 254): PISA, OECD.
Othman, H., Asshaari, I., Bahaludin, H., Nopiah, Z. M. & Ismail, N. A. (2011).
Evaluating the Reliability and Quality of Final Exam Questions Using Rasch
Measurement Model: A Case Study of Engineering Mathematics Courses.
Kongres Pengajaran dan Pembelajaran. 163-173.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
Copenhagen: Danish Institute for Educational Research.
Saidfudin, M., Azrilah, A. A., Rodzo'An, N. A., Omar, M. Z., Zaharim, A. & Basri,
H. (2010). Easier learning outcomes analysis using Rasch model in
engineering education research. Proceedings of the 7th WSEAS international
conference on Engineering education. Corfu Island, Greece. 442-447.
Williams, B., Onsman, A. & Brown, T. (2012). A Rasch and Factor Analysis of a
Paramedic Graduate Attribute Scale. Eval Health Prof 35(148), 22.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA
Press.
61
APPENDIX A
Description of Winsteps Software
1. Double click on the “Winsteps” shortcut icon on the computer desktop.
2. There will be a pop-up screen as shown below, then click on button “Import
from Excel, R, SAS and etc.”
3. Click on the green button, “Excel” file if the data saved is in Excel format.
7. The variable label needs to be copy and paste under the “Item Response
Variable” and “Person Label Variable” based on the study variables.
8. Followed by, clicking on the “Construct Winsteps file” button.
9. A pop-up will be appeared; user needs to name the file and click “Save”.
10. After that, Winsteps will help to scan and format the data until the
following pop-up is shown.
63
11. Then, press “Enter” key twice. All the choices on the tab will be unlocked.
Therefore, researcher can click on the “Output Table” to obtain the selected
analysis.
13. The output will be displayed in notepad format. Researcher can save it and
do further interpretation on the gained results.
14. Besides, user can also click on “Graph” if you prefer to look on the curve
or person and item histogram instead of the PIDM.