You are on page 1of 76

A RASCH MODEL ANALYSIS OF CRITICAL THINKING PROBLEM

SOLVING TEST

CHAI HUI CHUNG

A dissertation submitted in partial fulfillment of the


requirements for the award of the degree of
Master of Science (Mathematics)

Faculty of Science
Universiti Teknologi Malaysia

JANUARY 2015
iii

DEDICATION

To my beloved grandfather, father and mother


iv

ACKNOWLEDGEMENT

In preparing this dissertation, I was in contact with many people. They have
contributed towards my understanding and thoughts. In particular, I would like to
express my truthful appreciation to my thesis supervisor, Dr. Norazlina for her
guidance and encouragement while doing this study.

I am also indebted to express my deep appreciation to my beloved family


members who are always giving me spiritual support. They have always supported
while completing the study without their encouragement and motivation the study
would not have been successful.

In addition, special thanks to Critical Thinking Problem Solving committee


group give me an opportunity to take part in it. They are Prof. Dr. Zainal Abdul Aziz,
Assoc. Prof. Dr. Hjh. Rohanin Ahmad, my supervisor, Dr. Norazlina Ismail, Dr.
Arifah Bahar, Dr. Zaitul Marlizawati Zainuddin, Dr. Hjh. Zarina Mohd Khalid, and
Dr. Shariffah Suhaila Jamaludin. Also the librarians at Universiti Teknologi
Malaysia (UTM) also deserve special thanks for their assistance in finding the
relevant literatures.

Last but not least, I would like to thank all the lecturers and friends that have
guided me to complete the dissertation either directly and indirectly especially
Mariam who have given me invaluable assistance throughout my research work.
Thanks for their kindness and moral support.

Thank you.
v

ABSTRACT

Rasch measurement model is used in many researches to determine the


validity of the instrument. This study measure the validation of items and
performance among first year undergraduate students in Universiti Teknologi
Malaysia (UTM) by using the Rasch model. A sample of 981 students took part in
the study. The research instrument used was Critical Thinking and Problem Solving
Test (CTPST). Collected data were analyzed using the Winsteps 3.81 and Statistical
Package for the Social Science (SPSS) 16.0 for Windows. The results are presented
in logit values and mode respectively. The finding shows that the CTPST are suitable
to all first year undergraduate students as it only involves non-routine questions that
capture CTPS skills and do not follow any specific mathematical problems. Students
from Faculty of Electrical Engineering (FKE) have the highest achievement in
CTPST. However, the overall achievement shows that the students have low critical
thinking skills in solving problems. The items in CTPST also show
unidimensionality and fit to the model although there is a misfit items.
vi

ABSTRAK

Model Rasch telah banyak diaplikasikan dalam penyelidikan bagi


menentukankan kesahan instrumen. Justeru, kajian ini membincangkan sebab model
Rasch diaplikasikan dalam mengkaji kesahan soalan dan pemikiran kritikal dalam
kalangan siswazah tahun pertama di Universiti Teknologi Malaysia (UTM). Seramai
981 orang siswazah telah mengambil bahagian dalam kajian tersebut. Instrumen
kajian yang digunakan adalah Critical Thinking and Problem Solving Test (CTPST).
Data yang diperolehi dianalisis dengan menggunakan perisisan Winsteps 3.81 dan
Statistical Package for the Social Science (SPSS) 16.0 for Windows. Hasil kajian
dipersembahkan dalam bentuk logit dan mod. Keputusan daripada Winsteps dan
SPSS menunjukkan bahawa CTPST hanya melibatkan soalan bukan rutin yang
sesuai dijawab oleh semua siswazah tahun pertama, tahap pemikiran kritikal pelajar
di Fakulti Kejuruteraan Elektrik (FKE) adalah paling memuaskan tetapi pencapaian
keseluruhan kurang memuaskan. Item dalam CTPST juga menunjukkan
keseragaman dimensi dan sesuai diaplikasikan dalam model walaupun terdapat item
yang tidan sepadan.
vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xi
LIST OF FIGURES xii
LIST OF ABBREVIATIONS xiii
LIST OF SYMBOLS xiv
LIST OF APPENDIX xv

1 INTRODUCTION 1
1.1 Introduction 1
1.2 Background of the Study 3
1.3 Problem Statement 4
1.4 Objectives of the Study 4
1.5 Significance of the Study 5
1.6 Scope of the Study 5
1.7 Definition of Terms 6
1.7.1 Latent Trait 6
1.7.2 Logit 6
1.7.3 Rating Scale Model 7
1.8 Outline of the Study 7
viii

2 LITERATURE REVIEW 8
2.1 Introduction 8
2.2 Rasch Model 8
2.2.1 Fit Statistics 11
2.2.2 Misfit 11
2.2.3 Person and Item Reliability 12
2.2.4 Person and Item Distribution Map 12
2.2.5 Internal Consistency 13
2.3 Critical Thinking 13
2.4 Problem Solving 14
2.5 Summary 15

3 RESEARCH METHODOLOGY 16
3.1 Introduction 16
3.2 Research Framework 16
3.3 Rasch Model Analysis 16
3.3.1 Identify the Reliability of the Instrument 17
3.3.1.1 Rasch Reliability 18
3.3.1.2 Internal Consistency 18
3.3.2 Identify the Validity of the Instrument 19
3.3.2.1 Infit and Outfit Mean Square 19
3.3.2.2 Standardized Fit Statistics 21
3.3.2.3 Point Measure Correlation 22
3.3.2.4 Person and Item Separation 22
3.3.3 Identify the Person Performance and Item 23
Difficulties of the Instrument
3.3.4 Identify the Misfit Item in the Instrument 24
3.4 Descriptive Summary 24
3.4.1 Respondents of the Study 25
3.4.2 Mode 25
3.5 Research Instrument 25
3.6 Summary 31
ix

4 RASCH MODEL ANALYSIS 32


4.1 Introduction 32
4.2 Summary Statistics 32
4.2.1 Person Measure 32
4.2.2 Item Measure 33
4.3 Person and Item Distribution Map 35
4.4 Misfit 42
4.5 Unidimensionality 45
4.6 Summary 46

5 DATA ANALYSIS 48
5.1 Introduction 48
5.2 Respondents’ Demographic 48
5.2.1 Gender Distribution 48
5.2.2 Race Distribution 49
5.2.3 Faculty Distribution 49
5.3 Critical Thinking Problem Solving Level 51
5.4 Summary 53

6 CONCLUSIONS AND RECOMMENDATIONS 54


6.1 Introduction 54
6.2 Conclusions 54
6.3 Recommendations 55

REFERENCES 57
Appendix A 61 - 63
xi

LIST OF TABLES

TABLE NO. TITLE PAGE

3.1 Person and item reliability level 18


3.2 Internal consistency 19
3.3 MNSQ value and effects on the measurement 21
3.4 ZSTD value and effects on the measurement 21
3.5 Keys in PIDM 24
3.6 Accepted range 24
3.7 CTPS rubrics 27
3.8 CTPS score rubrics 28
5.1 Gender distribution 49
5.2 Faculty distribution 50
5.3 Mode on CTPST rating sale 52
xii

LIST OF FIGURES

FIGURE NO. TITLE PAGE

3.1 Framework of the research methodology 17


4.1 Summary of 981 measured person 33
4.2 Summary of 23 measured item 34
4.3 Item statistics CTPS 1 35
4.4 Person and item distribution map CTPS 1 36
4.5 Item statistics CTPS 2 37
4.6 Person and item distribution map CTPS 2 38
4.7 Item statistics CTPS 3 38
4.8 Person and item distribution map CTPS 3 39
4.9 Person and item distribution map CTPS in Total 40
4.10 Person and Item Histogram in Total 41
4.11 Item statistics of 23 measured item 42
4.12 Scalogram for top 30 students 44
4.13 Performance of top 100 students 44
4.14 Standardized residual variance 46
4.15 Identify item with noise 46
5.1 Race distribution 49
5.2 Faculty distribution (%) 50
5.3 Mode on CTPST rating sale 51
xiii

LIST OF ABBREVIATIONS

AA - Aspect of Assessment
CTPS - Critical Thinking Problem Solving
CTPST - Critical Thinking Problem Solving Test
FAB - Faculty of Built Environment
FBME - Faculty of Biosciences and Medical Engineering
FC - Faculty of Computing
FChE - Faculty of Chemical Engineering
FGHT - Faculty of Geoinformation and Real Estate
FKA - Faculty of Civil Engineering
FKE - Faculty of Electrical Engineering
FM - Faculty of Management
FPREE - Faculty of Petroleum and Renewable Energy Engineering
FS - Faculty of Science
MJIIT - Malaysian- Japan International Institute of Technology
MNSQ - Mean Square
OMNSQ - Outfit Mean Square
PIDM - Person and Item Distribution Map
PMC - Point Measure Correlation
RS - Razak School of Engineering and Technology
SA - Adjusted Standard Deviation
SE - Average Measurement Error
SPSS - Statistical Package for the Social Science
UTM - Universiti Teknologi Malaysia
WASI - Whimbey Analytical Skills Inventory
WGCTA - Watson Glaser Critical Thinking Appraisal
ZSTD - Outfit Z-Standard
xiv

LIST OF SYMBOLS

α - Cronbach’s Alpha
μ - Mean
% - Percentage
Q - Question
xv

LIST OF APPENDIX

APPENDIX TITLE PAGE

A Description of Winsteps software 61


CHAPTER 1

INTRODUCTION

1.1 Introduction

Physical traits, such as height, the process of assigning numbers can be done
directly using a ruler. However, psychological traits such as ability or proficiency are
constructs. They are unobservable but can be measured indirectly through a test by
using a tool (Khairani and Razak, 2012). Therefore, for the test that relate to
observable traits (such as test score) with unobservable traits (such as ability or
proficiency) researchers apply Rasch model.

Rasch model is new to the field of counseling psychology. However, several


of the advantages appear promising. For example, it has the benefit in identifying
unexpected results. In classical test models, outliers are identified by extreme scores,
but we take scores in the middle ranges to be acceptable, as long as the instrument
has generally been shown to be reliable. On the other hand, Rasch model would
identify a research participant who had responded randomly to the instrument.

Rasch model is a psychometric model for analyzing categorical data, such as


answers to questions on a reading assessment or questionnaire responses. In addition
to psychometrics and educational research, the Rasch model and its extensions are
used in other areas, including health industry (Williams et al., 2012) because of the
general applicability in it. It also plays as a function of the trade-off between the
2

respondent's abilities, attitudes or personality traits such as evaluate critical thinking


problem solving skills and the item difficulty.

Critical thinking is a major educational outcome required for higher education


institutions. Today, more than ever, educational programs are challenged to develop
students‟ critical thinking skills. In light of the shifting scope of practice in various
problem solving settings, every graduated students must be capable of adapting to
these ever-changing demands. Because of the demands placed on education
institutions to deliver quality skills in an interdisciplinary environment, the
development of critical thinking skills among university students is essential.

As stated in Malaysia Education Blueprint 2013-2025 (Ministry of Education,


2013), thinking skills is one of the attributes and aspirations that needed by every
student. The three elements mentioned in thinking skills are critical thinking and
innovation, problem solving and reasoning, and learning capacity. This is to promote
students for being innovated, approach issues critically and able to cope with the
value of lifelong learning.

Students nowadays tend to have negative attitudes towards problem solving


questions in their studies. Thus, it is very important to consider the factors affecting
the quality of understanding, and to assess the validity of the assessment being
carried out. An appropriate assessment tools in teaching and learning process is
required to measure students‟ understanding and ability fairly and equally. Moreover,
in the process of constructing these problem solving questions, it is crucial to have
equally distributed problem solving examination questions based on Bloom‟s critical
thinking skills, the level of students‟ ability and level of questions (items) difficulty
(Bloom, 1956).

Therefore, lecturers must gather, analyze and process information to make


logical decisions. The decisions need to be complex and require multiple levels of
decision making. Regardless of the magnitude of the decisions to be made, it is
essential that lecturers have the clinical reasoning and critical thinking skills to make
3

good decisions. However, do these students have critical thinking skills and the
abilities to apply those skills in many different contexts? Do deans or program
directors at colleges and universities can ensure that graduate students are able to
think critically in complex situations?

In short, although Rasch model measures an abstract construct (latent trait), it


has the same measurement properties as a ruler. Its mathematical characteristics
allow a transformation from binary or ordinal answer patterns. This ensures the
analysis to be more accurate.

1.2 Background of the Study

From several applications of Rasch model to rating scales, various benefits of


Rasch analysis have been defined. First, the Rasch model is able to construct linear
measures from any ordered nominal data by providing a simple and practical way to
construct so that subsequent statistical analysis can be applied without a concern for
linearity. Moreover, parameter estimations are independent from the individuals and
items used. Third, since both item difficulty and individual ability are located on the
same scale, therefore, the testing results can be interpreted in a single reference
framework. Due to these features, it has been reported that the application of the
Rasch model is advantageous to construct objective and additive scales (Bond and
Fox, 2001).

Rasch (1960) cited in Othman et al. (2011) also declared that Rasch model is
one of the reliable and suitable way in assessing student‟ ability. Ghulman and
Mas'odi (2009) declared that Rasch measurement is beneficial with its predictive
feature to overcome the missing data.

Study done by Saidfudin et al. (2010) proved that Rasch model can
categorize grades into learning outcomes more accurately especially in dealing with
small number of sampling units. Aziz et al. (2008) also applied Rasch model to
4

validate the construct of measurement instrument. Meanwhile, Osman et al. (2012)


stated that person and items distribution map (PIDM) can give a clear overview on
the students‟ learning effectiveness based on the data on a linear scale of
measurement.

Therefore, this study focuses on using Rasch model as an assessment tools


that would enable researchers to measure general problem solving competences. It
can be used to evaluate the reliability and quality of the Critical Thinking Problem
Solving Test (CTPST) questions and check whether these questions calibrated with
students' abilities.

1.3 Problem Statement

Rasch model agrees the generalizability across samples and items, allows for
testing of unidimensionality, produces an ordered set of items, and identifies poorly
functioning items as well as unexpected responses. In this study, solving problems
involving critical thinking skills is evaluated. Due to the problems, the study is
proposed to determine the effectiveness of Critical Thinking Problem Solving Test
(CTPST) in developing this ability and the level of critical thinking problem solving
abilities based on faculties.

1.4 Objectives of the Study

In the view of the above stated requirements and problems, the present
research aims at the following main objectives:

(i) To validate Critical Thinking Problem Solving Test (CTPST) by using


Rasch model.
5

(ii) To identify the critical thinking level in solving problem for each
faculty through Winsteps 3.81 and Statistical Package for Social
Science (SPSS) version 16.0.

1.5 Significance of the Study

This study focuses in developing the reliability and validity of the questions
and students‟ performance. Computer software, Winsteps will be able to solve large
sample size of respondents and items with less computational effort. The main
contributions of the research are summarized as follows:

(i) Analyze the reliability and validity of the problems using Winsteps.

(ii) Evaluation of the students‟ and faculties‟ performance.

1.6 Scope of the Study

In this study, routine and non-routine problems are taken into account as an
assessment tool. The respondents will be the first year undergraduate students from
selected faculties in UTM. There are a total of 981 students where 441 of them are
male respondents and 540 of them are female respondents. In the study, the sample is
chosen randomly to gain more accurate results.

The instrument for this study is Critical Thinking Problem Solving Test
(CTPST). Data collected will be performed from the output of Winsteps software
version 3.81.0 which will be used to interpret the validity and reliability of the
CTPST in term of person and item separation respectively, misfit item and
unidimensionality. In addition, Statistical Package for the Social Sciences (SPSS)
version 16.0 will be used to determine the critical thinking level for each faculty.
6

1.7 Definition of Terms

In this study, there are a few terms being used that are related to Rasch model.
They are being defined as below:

1.7.1 Latent Trait

This term refers to certain human attributes that are not directly measurable.
In the theory of latent model, a person‟s performance can be quantified and the
values are used to interpret and explain the person‟s test response behavior.
Frequently, trait and ability are used interchangeably in the literature. (Andrich, 1978)

1.7.2 Logit

Logarithm of odds, logit is the unit of measurement when the Rasch model is
used to transform raw scores obtained from ordinal data to log odds ratios on a
common interval scale.

When the function's parameter represents a probability p, the logit function


gives the log-odds or the logarithm of the odds as equation (1.1).

( ) (1.1)

A logit has the same characteristics of an interval scale in that the unit of
measurement maintains equal differences between values regardless of location. The
value of 0.0 logit is routinely allocated to the mean of the item difficulty estimates
(Bond and Fox, 2001).
7

1.7.3 Rating Scale Model

It is one of the Rasch family models developed by Andrich (1978). The rating
scale model can be applied to polytomous data obtained from ordinal scales or Likert
scales. In the item response theory framework, the rating scale model is categorized
as a one-parameter logistic model.

1.8 Outline of the Study

This thesis contains six chapters. Next, Chapter 2 provides information about
the model used in carried out the study namely Rasch model. This chapter also shares
the literature review on critical thinking and problems solving. Then, Chapter 3
presents the research methodology adopted in carrying out the work. The chapter
explains the descriptive statistics used in Rasch analysis. Chapter 4 presents the
framework on validation of the items, faculties‟ achievement and discussion on the
result obtained from Winsteps version 3.81.0. Chapter 5 is about the discussion on
the performance of each faculty towards CTPST from SPSS version 16.0 outputs.
Finally, the last chapter gives the conclusion of the study and the recommendations
for future work.
CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

The first section of this chapter is to compare a number of characteristics of


Rasch model measurement, particularly those related to item generation and
reliability. Then, the chapter briefly shares on the meaning of critical thinking and
problem solving. This chapter also provides the related works in the Rasch analysis
that has been carried out by other researchers.

2.2 Rasch Model

Rasch model suggests an alternative theory for constructing measurement that


is based on a strict mathematical formula. It combines the ordering features of
Guttman scaling with a more realistic probabilistic framework (Bond and Fox, 2001).
Based on Wright and Stone (1979), Rasch model also puts particular emphasis on
covering the entire continuum and requires the inclusion of items with different
intensity to achieve acceptable measures. This feature is considered particularly
useful for developing a measurement for CTPST.

One major problem in measurement is the relationship between the person


being measured and the instrument involved. Performance of a person is known to be
9

dependent on which instrument is used to measure his or her trait (Khairani and
Razak, 2012). However, this shortcoming is avoided by procedure of conjoint
measurement in Rasch model. It has been explained that in conjoint measurement,
the unit of measurement is not the examinee or the item, but rather the performance
of an examinee regarding to a particular item.

Rasch model is a new measurement method that uses data from the students‟
assessment and transforms it into „logit‟ scale thus transform the assessment outcome
into a linear correlation with equal interval (Osman et al., 2012). In Rasch, it
produced a reliable repeatable measurement instrument instead of establishing the
„best fit line‟ (Aziz et al., 2008).

In the case of the Rasch model for a dichotomous item where there are only
two response categories, the mathematical function of the item characteristic curve is
given by equation (2.1) (Rasch, 1960).

( )
[ ( )] ( ) (2.1)

where θ = random variable indicating success or failure on the item.


βn = person-parameter (person‟s ability on the latent variable scale).
δi = item-parameter (difficulty on the same latent variable scale).
e = base of natural logarithm or Euler‟s number; 2.7183

From equation (2.1), the probability of success is a function of the difference


between a person‟s ability and the item difficulty. When the ability equals the item
difficulty, the probability of success is 0.5.

Rasch exponential expression is a function of Logistic Regression which


resulted in a Sigmoidal ogive and can be transformed into simpler operation by
reducing the indices by logarithm:

( )
[ ( )] [ ( ) ] (2.2)
10

Equation (2.2) shows that βn – δi (difference between person‟s ability and


item difficulty) is illustrated as the logarithm of the odds of success of the person on
the item. Therefore, the measurement unit of the scale is known as “logit” for ability
and item difficulty.

The probability of a successful event; ln[P(θ)] is reduced to the expression


termed logit and can be construed simply as the difference of person ability; βn and
the item difficulty; δi , as in equation (2.3).

[ ( )] ( )
(2.3)

One extension of the Rasch model that is generalized to polytomous


responses is the rating scale model (Andrich, 1978). It is used for multiple choice
items on tests, questionnaires and surveys that often have more than two response
options such as Likert scales, rating scales and educational assessment items which
are intended to indicate increasing levels of competence or attainment. Responses to
items may reflect one or more underlying or unobserved trait, ability, or attitude. The
goal in modeling responses to items is to measure individuals‟ values on latent trait.

In order for the Rasch Model measurement to have the “examinee-free” item
difficulty and “item-free” examinee ability measurement, two important assumptions
must be met. Firstly, the data must meet the unidimensionality assumption, that is,
they represent a single construct. All the items forming the questionnaire measure
only under the latent trait of the study. Secondly, Rasch model requires that the data
must fit the model (Khairani and Razak, 2012), that is local independence (the
response to a given item is independent from the responses to the other items in the
questionnaire).

From the study of Ghulman and Mas'odi (2009) and through the Rasch model
and Bloom's Taxonomy learning domain, they found out the reason for behavioral
change occurred based on the Bloom‟s Taxonomy. There were two findings shown;
first is whether the learning process is in risk or secondly the teaching process needs
11

to be revised. Rasch has made it very useful with its predictive feature to overcome
missing data.

Khairani and Razak (2012) said that there was more evidence in favor of the
Rasch model as having the capacity to resolve some of the elementary issues in
measurement. Nevertheless, in order to hold the construction of validity, the model
requires more evidence especially the corresponding between theoretical perspective
and the observable behaviors. Test developers would need to have a thorough
understanding of the measured construct especially information on relative
difficulties of the items so that they can conceptualize the measured construct.

2.2.1 Fit Statistics

In Rasch analysis, researchers can gain mean, μ to determine the performance


of the students. In the study of Knutson et al., (2010) and Nopiah et al., (2012), they
proved that if person mean, μP less than item mean, μI (μP < μI) this indicated that
students are below the expected performance in answering the instruments and vice
versa.

2.2.2 Misfit

Ghulman and Mas'odi (2009), Othman et al. (2011) and Osman et al. (2012)
stated that there will be simple comparison procedure within three-step, to figure out
which item does not fit the Rasch model. They are point measure correlation (PMC),
outfit mean square (OMNSQ) and outfit z-standard (ZSTD).

Nopiah et al. (2012) stated that if there is a small correlation value means
many students could not answer the question. If the value of OMNSQ > 1.5 and
12

ZSTD > 2 shows that the inability of poor student to answer difficult question. If
OMNSQ < 0.5 and ZSTQ < -2 shows that poor student cannot answer easy question.

From the output statistics of Rasch analysis, it is agreed by researchers that to


identify the misfit items in an instrument, PMC, outfit MNSQ and outfit ZSTD
values must be out of the range. If one of the values is within the range, it still can be
accepted as instrument item. The accepted range will be discussed in Chapter 3.

2.2.3 Person and Item Reliability

Khairani and Razak (2012) claimed that if the reliability of item difficulty
measures were high (0.99) mean the ordering of item difficulty was replicable with
other comparable sample of examinee. Meanwhile, consistency of examinees‟
measures which is equivalent to Cronbach‟s alpha was also high (0.90) implies that it
was highly likely that the ordering of examinees proficiency can be replicated.

2.2.4 Person and Item Distribution Map

Person and item distribution map (PIDM) is similar to traditional histogram


tabulation; however in PIDM it allows both the person and item to be mapped
together side-by-side to give a better view how the person correlate to the respective
items (Nopiah et al., 2012; Othman et al., 2011). In Rasch model, each item is given
a linear and objective measure of the difficulty to endorse that item, so that the items
can be ranked from the least to the most difficult (Wright and Masters, 1982). Hence,
the persons‟ ability and the relevant item difficulty can be illustrated clearly.

Othman et al. (2011) shared that from PIDM scale, it can indicate the most
difficult item and the most able test takers. It can also identify the redundancies on
13

the item measured so that researchers can decide whether changing towards the
instrument is needed.

Nopiah et al. (2012) in the research indicated that higher ranking in PIDM
means that the item was more difficult. The orthogonal arrow in the map shows the
gap between the two items. The wider the gap, the more difficulty the students
encountered when attempting to answer the question.

All the researchers concluded that PIDM will give a clear view on the
relationship between persons‟ performance (higher ranking means he is the best
performer) and item difficulty (higher ranking means hardest item).

2.2.5 Internal Consistency

In the study, Osman et al. (2012) showed that the internal consistency is
being determined through Cronbach‟s alpha, α. If α = 0.66 which is slightly higher
than the acceptable level α = 0.6, then the model is acceptable.

Saidfudin et al. (2010) also stated that in normal statistical analysis if the
value of α which is disturbingly low such as α = 0.33 the test of evaluation have to be
ignored as it is below the acceptable level 0.6.

2.3 Critical Thinking

Watson Glaser Critical Thinking Appraisal (WGCTA) is used to measure


important abilities and skills involved in critical thinking. It is a psychometric test of
critical thinking and reasoning. In the study of Faux (1992), the hypothesis on critical
thinking, as measured by the WGCTA, has no relationship with problem solving
ability as measured by the Whimbey Analytical Skills Inventory (WASI) was
14

rejected. This is because the Pearson product moment correlation for the score of the
WGCTA with the score of the WASI produced correlation coefficients, r values of
0.65 and above which imply a strong positive relationship between the scores of the
WGCTA and the WASI.

In short, critical thinking enables us to implement, monitor and adjust the


solutions by effectively frame, explore and reframe strategic issues. It involves
questioning assumptions, making evaluations and requires the ability to identify and
focus on relevant information when reaching conclusions. Hence, by undertaking
critical thinking exercises has a positive effect on the critical thinking skills of
students (Johnstone, 2006).

2.4 Problem Solving

Problem solving is well-defined as a process which used to achieve a greatest


solution to an unknown, or a judgment subject to some constraints. Problem solving
skills have always been significant in many line of work as well as in learning
process. Results from Program for International Student Assessment (PISA) 2012
shows that there are relationship between socio-economic status, immigrant
background and problem-solving performance. Czech Republic, Turkey, Brazil,
Malaysia, the Russian Federation, Serbia and Shanghai‑China prove that the impact
of socio-economic status stronger on problem solving than on mathematics
performance. Besides, immigrant students in Brazil, Spain, Israel, Croatia, the
Russian Federation and the United Arab Emirates perform better than non-immigrant
students in problem solving. Conversely, in England (United Kingdom), Denmark,
Italy, Australia, France, Belgium, Ireland, Canada, Serbia, Macao-China, Hong
Kong-China and Singapore, immigrant students perform poorer in problem solving
than non-immigrant students (OECD, 2014).

On the other hand, Australia, England and the United States, the best students
in mathematics also have excellent problem-solving skills. These countries‟ good
performance in problem solving was mainly due to strong performers in mathematics.
15

This may suggest that in these countries, top performers in mathematics have access
to improve their problem-solving skills (Bortoli and Macaskill, 2014).

In conclusion, problem solving skills is important to enhance ones learning. It


helps to improve students‟ confidence level in approaching the real world problems
and help to create a more interesting and enjoyable learning environment either
between student and the instructor or manager and employee.

2.5 Summary

This chapter provides an overview on critical thinking and problem solving.


It also discusses on the results from previous research papers on Rasch model. The
next chapter will focus on the methodology.
CHAPTER 3

RESEARCH METHODOLOGY

3.1 Introduction

This chapter provides research methodology where the research activities that
will be carried out towards achieving the objectives of this research are presented.

3.2 Research Framework

Figure 3.1 shows the flow chart of the research framework of the study.

3.3 Rasch Model Analysis

In this study, Rasch model analysis is being carried out to identify the
reliability, person performance and item difficulties of the instrument, the misfit item
in the instrument and validity of the instrument. Therefore, Appendix A describes
how to use the software.
17

Literature review on Rasch model,


critical thinking and problem
solving

Description of the Rasch model,


critical thinking and problem
solving and gathering of the
information

Applying Rasch model

Identify the Identify the Identify the Identify the


reliability of validity of person misfit item
the the performance in the
instrument instrument and item instrument
difficulties of
the instrument

Apply Statistical Package for


Social Science (SPSS)

Identify the critical thinking


problem solving level for each
faculty.

Figure 3.1 Framework of the Research Methodology

3.3.1 Identify the Reliability of the Instrument

There are two ways to identify the reliability of this study. They are Rasch
reliability and internal consistency.
18

3.3.1.1 Rasch Reliability

The reliability of an instrument is applied to determine the overall quality of


the instrument. In particular, it is to check the fit of the items. Each time when there
is any recoding, collapsing of categories or when items are removed is made; it is
significant to recheck the impact of these changes on the reliability of the instrument
as to ensure that changes have brought about an overall improvement to the quality
of the instrument.

For item and person reliabilities, a value close to 1.0 is considered good
reliability because the value indicates the percentage of observed response variance
that is reproducible. Table 3.1 displays the person and item reliability level by
Sumintono and Widhiarso (2013).

Table 3.1: Person and Item Reliability Level


Range Reliability Level
< 0.67 Weak
0.67 - 0.80 Enough
0.81 - 0.90 Good
0.91 - 0.94 Excellent
> 0.94 Extremely good

3.3.1.2 Internal Consistency

In statistics, Cronbach's alpha, α is internal consistency coefficient because it


increases as the inter-correlations among test items increase. Usually, it is used to
estimate the validity of a psychometric test for a sample of examinees. An accepted
rule for describing internal consistency using Cronbach's alpha is as in Table 3.2.
19

Table 3.2: Internal Consistency


α Internal consistency
≥ 0.9 Excellent
0.7 ≤ α < 0.9 Good
0.6 ≤ α < 0.7 Acceptable
0.5 ≤ α < 0.6 Poor
α < 0.5 Unacceptable

3.3.2 Identify the Validity of the Instrument

There are few perspectives in Rasch model that use to identify the validity of
the instrument which are being discussed as below:

3.3.2.1 Infit and Outfit Mean Square

The Rasch model provides two forms of fit statistics: infit and outfit mean
square. Mean square, MNSQ fit statistics show the size of the randomness that is the
amount of distortion of the measurement system. The expected value is 1.0. Values
less than 1.0 indicate observations are too predictable (redundancy, data overfit the
model). Values greater than 1.0 indicate unpredictability (un-modeled noise, data
under fit the model).

As stated by Linacre (2002), mean square fit statistics in Rasch analysis have
been defined such that the model-specified uniform value of randomness is indicated
by 1.0. The value above 1.5 indicates more than 50% unexplained randomness. If the
values greater than 2.0 suggest that there is more unexplained noise than explained
noise, so indicating there is more misinformation than information in the
observations. Nevertheless large mean-squares do indicate that the segments of the
data may not support useful measurement.
20

Infit mean square means inlier-sensitive or information-weighted fit. It is


more sensitive to the pattern of responses to items targeted on the person, and vice-
versa. It is based on the chi-square statistic with each observation weighted by its
statistical information (model variance) as displayed in equation (3.1).

∑[ ]
(3.1)

On the other hands, outfit mean square means outlier-sensitive fit. It is more
sensitive to responses to items with difficulty far from a person, and vice-versa. This
is based on the conventional chi-square statistic. A chi-square statistic is the sum of
squares of standard normal variables. For ease of interpretation, this chi-square is
divided by its degrees of freedom to have a mean-square form and reported as
“Outfit” as presented in equation (3.2).

∑[ ]
(3.2)

For the outlier-sensitive, outfit mean square (OMNSQ) which is equivalent to


a conventional statistical chi-square statistic, this misinformation may be confined to
explainable and easily remediable observations. Therefore, if an OMNSQ value
greater than 2.0 indicates that there is more unexplained information than explained
information in the observation of the category. Lastly, by investigating category fit
expressed in terms of OMNSQ is useful for assessing the quality of category function.

For rating scales, a high mean-square associated with a particular category


indicates that the category has been used in unexpected contexts. Unexpected use of
an extreme category is more likely to produce a high mean-square than unexpected
use of a central category. In fact, central categories often exhibit over-predictability,
especially in situations where respondents are careful. Table 3.3 shows the range of
mean square value and its effects on the measurement.
21

Table 3.3: MNSQ Value and Effects on the Measurement


MNSQ Effects on the measurement
> 2.0 Distorts or degrades the measurement system. May be caused by
only one or two observations.
1.5 – 2.0 Unproductive for construction of measurement, but not degrading.
0.5 – 1.5 Productive for measurement.
< 0.5 Less productive for measurement, but not degrading. May produce
misleadingly high reliability and separation coefficients.

Cited from http://www.rasch.org/

In summary, item fit is an index of how well items function in reflection of


the trait. Therefore, in this study the accepted MNSQ is 0.5 < MNSQ < 1.5 which
was also agreed by Othman et al. (2011) and Nopiah et al. (2012).

3.3.2.2 Standardized Fit Statistics

Standardized fit statistics or outfit z-standard, ZSTD are t-tests of the


hypothesis to test whether the data fit the model perfectly. It is also well known as z-
scores. If the data fit the model, 0.0 is the expected value. Less than 0.0 indicates too
predictable. Conversely, more than 0.0 indicates lack of predictability. Its
standardized values can be positive and negative. Table 3.4 illustrations the range of
standardized value and its effects on the measurement. And, it is also being agreed
with Othman et al. (2011) and Nopiah et al. (2012) in their study.

Table 3.4: ZSTD Value and Effects on the Measurement


ZSTD Effects on the measurement
≥ 3.0 Data much unexpected if they fit the model (perfectly), so they
probably do not. But, with large sample size, substantive misfit
may be small.
2.0 – 2.9 Data noticeably unpredictable.
22

-1.9 – 1.9 Data have reasonable predictability.


≤ -2.0 Data are too predictable. Other "dimensions" may be
constraining the response patterns.

Cited from http://www.rasch.org/

3.3.2.4 Point Measure Correlation

Point measure correlation is the correlation between the observations and the
Rasch measures as in equation (3.3). The range of the correlation is -1 to +1. Hence,
the accepted value for point measure correlation is 0.4 < PMC < 0.8 (Othman et al.,
2011 and Nopiah et al., 2012).

∑ ( ∑ )( ∑ )
(3.3)
√∑ ( ∑ ) ∑ ( ∑ )

where X1,..,XN are the responses by the persons (or on the items), and Y1,..,YN are the
person measures (item easiness = - item difficulties).

3.3.2.5 Person and Item Separation

Rasch model provides two useful indices describing the separation of items
on a variable and the separation of persons on a scale, respectively. Separation
coefficient, S is the ratio of the person (or item) the adjusted standard deviation, SA
divided by the average measurement error, SE which is known as standard deviation
of the error. Based on Nopiah et al. (2012), if the separation value is low means less
variability of person on the trait and vice versa.

Person separation, SP is used to classify people and estimate of how well the
scale identifies individual differences. It is being calculated from equation (3.4).
23

With a relevant person sample, if there is low person separation that is SP < 2,
implies that the instrument may not be sensitive enough to distinguish between high
and low performers. In short, more items may be needed.

(3.4)

Item separation, SI is used to verify the item hierarchy and estimate of how
well the scale separates test items. It is being determined from equation (3.5) (Chung,
2005). Low item separation, which is SI < 3, indicates that the person sample is not
large enough to confirm the item difficulty hierarchy. Therefore, to construct the
validity of the instrument more respondents may be needed.

(3.5)

3.3.3 Identify the Person Performance and Item Difficulties of the Instrument

In person and item distribution map (PIDM), item mean, μI serves as a


threshold and it is set to zero on the logit scale (Osman et al., 2012). Table 3.5 helps
to explain the symbols used in PIDM which agreed by Chung (2005).

The higher the location of item from the μI, the more difficult the item
compared to an item on a lower location. Similarly, for person distribution, the
excellent students were located at top of the map while the poor students were
located at the bottom of the map. Therefore, the level of a person‟s ability can be
identified from PIDM by looking at the separation between the person and item on
the map. The bigger the separation, the more able a person is likely to achieve the
item.
24

Table 3.5: Keys in PIDM


Symbol Explanation
# Number of respondents
M Mean
S First standard deviation
T Second standard deviation

3.3.4 Identify the Misfit Item in the Instrument

Rasch analysis is used as a tool to validate the questionnaire; therefore, it can


identify the misfit item which is also known as outlier. It is used to identify how well
the data fit the model, it is important to be able to diagnose immediately where the
misfit is the worst, and then try to understand the misfit terms in the instrument as to
decide whether to edit or delete the item. As stated in Chapter 2, to identify the misfit
item, it must fall outside the accepted range as presented in Table 3.6.

Table 3.6: Accepted Range


Term Value
Outfit Mean Square (MNSQ) 0.5 < MNSQ < 1.5
Outfit Z-Standard (ZSTD) -2.0 < ZSTD < +2.0
Point Measure Correlation (PMC) 0.40 < PMC < 0.85

3.4 Descriptive Summary

Descriptive summary is very important because it is the term given to the


analysis of data that helps describe, show or summarize data in a meaningful way by
numerical or graphical method.
25

3.4.1 Respondents of the Study

The total participants in this study were 981 first year undergraduate students
in Universiti Teknologi Malaysia (UTM) from 12 faculties and schools. They are
from Faculty of Built Environment (FAB), Faculty of Biosciences and Medical
Engineering (FBME), Faculty of Civil Engineering (FKA), Faculty of Computing
(FC), Faculty of Electrical Engineering (FKE), Faculty of Chemical Engineering
(FChE), Faculty of Geoinformation and Real Estate (FGHT), Faculty of
Management (FM), Faculty of Science (FS), Faculty of Petroleum and Renewable
Energy Engineering (FPREE), Razak School of Engineering and Technology (RS)
and Malaysian- Japan International Institute of Technology (MJIIT).

3.4.2 Mode

This quantitative research study is both descriptive and exploratory.


Therefore, the instrument gathers information regarding demographic variables and
an overall critical thinking score. The data were subsequently analyzed with
Statistical Package for the Social Sciences (SPSS) version 16.0.

Mode is one of the measures of central tendency. It is defined as the most


common value obtained in a set of observations or the element that occurs most often
in the collection.

3.5 Research Instrument

The test consists of two parts which are Part A and Part B where students are
required to answers all questions which only involve basic calculation and logical
thinking. There are 23 questions, including the sub-questions. This instrument was
26

assumed fit to measure the critical thinking ability of students. The students were
given one hour to answer all the questions.

From CTPST, it measures five subscale critical thinking areas as mentioned,


including analysis and interpretation, inference, evaluation and explanation,
deductive reasoning, and inductive reasoning. Questions invite test takers to make
interpretations, analyze information, draw reasonable inferences, recognize claims
and reasons, and evaluate the quality of arguments.

In this study, there are four critical thinking skills to be evaluated. They are
ability to define and analyse problems in complex, overlapping, ill-defined domains
and make well-supported judgment (CTPS 1), ability to apply and improve on
thinking skills, especially skills in reasoning, analyzing and evaluating (CTPS 2),
ability to look for alternative ideas and solutions (CTPS 3) and ability to „think
outside the box‟ (CTPS 4). Each of the CTPS skills does have its own criteria as
shown in Table 3.7. For example, CTPS 1_1_1 means to evaluate students‟ ability to
state and define the problem, CTPS 2_2_1 evaluate students‟ ability to state the
„how/when/where/what‟ and so on.

Students‟ answer will be checked based on suggested solution and their


standardized marks were then analysed by using rating scale as in Table 3.8. From
Table 3.8, each CTPS skills will be given four score. For example, for CTPS 1_1_1,
student will be given score one if students are not able to state and define the
problem, score two if students state and define the problem but unsatisfactorily, score
three if students able to state and define the problem satisfactorily and given higher
score that is score four if students able to state and define the problem accurately.
27

Table 3.7: Critical Thinking Problem Solving Rubrics

Aspects of
Descriptors Performance criteria
Assessment

Ability to define 1. Ability to state and define

and analyse the problem

problems in CTPS1-AA1 2. Ability to identify related


C
complex, Problem concepts/laws/
T
P overlapping, ill- interpretation rules/equations/models
S
defined domains 3. Ability to make correct
1 and make well- assumptions

supported CTPS1-AA2 2. Ability to compare and


judgment Problem analysis contrast

CTPS2-AA1
Ability to apply
Reasoning/ 1. Ability to state the „why‟
and improve on
C Rationalizing
T thinking skills,
CTPS2-AA2 1. Ability to state the
P especially skills in
S Analyzing „how/when/where/what‟
reasoning,
1. Ability to re-conciliate the
2 analyzing and CTPS2-AA3
whole information
evaluating Evaluating
2. Ability to make judgment

CTPS3-AA1 2. Ability to contrast


Comparative Study different ideas

CTPS3-AA2 2. Ability to produce


C Ability to look for Creative Thinking alternative ideas
T
alternative ideas
P
S and solutions. 1. Ability to consider
alternative course of
3
CTPS3-AA3 action/policy options
Innovative 2. Ability to apply new
approaches to problem
solving
28

3. Ability to realize the


solution and put into
practice
C
T CTPS4-AA3 1. Ability to contrast from
Ability to „think
P Creative and mainstream ideas (towards
S outside the box‟
innovative problem solving)
4

Table 3.8: Critical Thinking Problem Solving Score Rubrics

Aspect of Assessment: CTPS1-AA1 Problem interpretation


Criteria / Score 4 3 2 1

Ability to State and State and State and Not able to


state and define the define the define the state and
1
define the problem problem problem but define the
problem accurately satisfactorily unsatisfactoryly problem

Ability to The identified Failed to


Identified the The identified
identify concepts/ identify the
correct concepts/
related laws/ rules/ intended
concepts/ laws/ rules/
2 concepts/ equations/ concepts/
laws/ rules/ equations/
laws/ rules/ models are laws/rules/
equations/ models are
equations/ mostly equations/
models mostly correct
models irrelevant models
Ability to
All Most of the Most of the None of the
make
3 assumptions assumptions assumptions assumptions
correct
are correct are correct are incorrect are correct
assumptions
Aspect of Assessment: CTPS1-AA2 Problem analysis
Criteria / Score 4 3 2 1
Ability to Made Made Made minimal Unable
2 compare excellent adequate comparison to compare
and contrast comparison comparison and and
29

and and contrasting contrast


contrasting contrasting

Aspect of Assessment: CTPS2-AA1 Reasoning/Rationalizing


Criteria / Score 4 3 2 1
Ability to Correct Adequate Statement of Unable to
1 state the statement of statement of the „why‟ with state the
„why‟ the „why‟ the „why‟ error „why‟
Aspect of Assessment: CTPS2-AA2 Analyzing
Criteria / Score 4 3 2 1
Correct Adequate Statement of Unable to
Ability to
statement of statement of the „how/ state the
state the
1 the the how/ when/ where/ „how/when/
how/when/
„how/when/ when/ where/ what‟ with where/what
where/what
where/what‟ what‟ error ‟
Aspect of Assessment: CTPS2-AA3 Evaluating
Criteria / Score 4 3 2 1
Ability to
Effectively Satisfactorily Marginally Unable to
re-conciliate
1 merged the merged the merged the merge the
the whole
information information information information
information
Ability to Made Made Unable to
Made marginal
2 make exemplary proficient make
judgment
judgment judgment judgment judgment
Aspect of Assessment CTPS3-AA1 Comparative Study
Criteria / Score 4 3 2 1
Made Made
Ability to Made Unable to
excellent reasonable
contrast inadequate contrast
2 contrast of contrast of
different contrast of different
different different
ideas different ideas ideas
ideas ideas
Aspect of Assessment: CTPS3-AA2 Critical Thinking
30

Criteria / Score 4 3 2 1
Ability to Unable to
Produced Produced Produced
produce produce
2 outstanding acceptable insufficient
alternative alternative
ideas ideas ideas
ideas ideas
Aspect of Assessment: CTPS3-AA3 Innovative
Criteria / Score 4 3 2 1
Ability to Adequate Unable to
High Barely able to
consider consideration consider
consideration consider
alternative to alternative alternative
to alternative alternative
1 course of course of course of
course of course of
action/ action/ action/
action/ policy action/ policy
policy policy policy
options options
options options options
Highly Capable in Barely
Ability to Unable to
capable in applying capable in
apply new apply new
applying new new applying new
2 approaches approaches
approaches approach to approaches to
to problem to problem
to problem problem problem
solving solving
solving solving solving

Ability to Highly Low


Proficient in Unable to
realize the proficient in proficiency in
realizing the realize the
solution realizing the realizing the
3 solution and solution and
and put solution and solution and
putting it putting it
into putting it into putting it into
into practice into practice
practice practice practice

Aspect of Assessment: CTPS4-AA3Creative and innovative


1 Ability to
contrast Highly Rarely
Contrasting
from contrasting contrasting Just follow
ideas
mainstream ideas ideas
ideas
31

In this study, mode will be used to identify, in order to describe the central
position of the variable attached with tabulated description and graphical description
by using tables and charts respectively.

3.6 Summary

This chapter discusses the research methodology adopted in evaluating level


of critical thinking in problems solving. The standard range values for every factor
are identified. Other than that, Chapter 3 provides explanation on the output values
based on the range.
CHAPTER 4

RASCH MODEL ANALYSIS

4.1 Introduction

In this chapter, all data collected was analyzed by using Winsteps version
3.81.0. A full discussion is done on the validation of Critical Thinking Problem
Solving Test (CTPST) and the results between person and items in total and each of
the CTPS.

4.2 Summary Statistics

In Rasch analysis, there are two types of summary statistics. They are person
measure and item measure. Therefore, the results from Rasch analysis are explained
in details as follows.

4.2.1 Person Measure

Person measure gives summary on the sample of the study. Figure 4.1
illustrates summary of 981 measured people. A fair person spread of 3.21 logit
(spread between maximum measured persons 0.95 to minimum measured person -
2.26) with person separation, SP = 1.60 and good internal consistency of examinees‟
33

measures (equivalent to Cronbach‟s alpha) was also high that is, 0.73 which is above
the acceptable level 0.60. This shows that more items are needed to distinguish high
and low performers and there are inter-correlations among CTPST items respectively.

Figure 4.1 Summary of 981 Measured Person

As the value of separation and person reliability is high, this shows more
variability of person on the trait and responses to the statements in the questionnaire.
Similarly, it shows that greater consistency with higher reliability coefficient for the
data respectively (Kasim and Annuar, 2011). The major finding is the person mean,
μP = -0.26 logit which is lower than the value of item mean, μI = 0. These values
show that the students were found to be below the expected performance in
answering the questions although the solution in CTPST was designed with basic
logical interpretations and reasonable decision to measure the undergraduate students‟
critical thinking level without applying any specific mathematical models.

4.2.2 Item Measure

Item measure gives summary on the instruments as shown in Figure 4.2. It


illustrates summary of 23 measured items. The item summary gives a good summary
with high reliability that is, 0.99 with item spread of 1.67 logit (spread between
maximum measured items 0.64 to minimum measured items -1.03). Reliability of
item difficulty measures were high (0.99) suggesting that the ordering of item
difficulty was replicable to other comparable sample of respondents (Khairani and
34

Razak, 2012). Therefore, the items in CTPST are suitable to be applied to any first
year undergraduate students regardless of any specific faculty.

The item separation, SI = 13.85 indicates that there are 14 groups classifiable
from the question which are CTPS 1_1_1 (ability to state and define the problem),
CTPS 1_1_2 (ability to identify related concepts), CTPS 1_1_3 (ability to make
correct assumptions), CTPS 1_2_2 (ability to compare and contrast), CTPS 2_1_1
(ability to state the „why‟), CTPS 2_2_1 (ability to state the „how/when/where/what‟),
CTPS 2_3_1 (ability to re-conciliate the whole information), CTPS 2_3_2 (ability to
make judgment), CTPS 3_1_2 (ability to contrast different ideas), CTPS 3_2_2
(ability to produce alternative ideas), CTPS 3_3_1 (ability to consider alternative
course of action), CTPS 3_3_2 (ability to apply new approaches to problem solving),
CTPS 3_3_3 (ability to realize the solution and put into practice) and CTPS 4_3_1
(ability to contrast from mainstream ideas towards problem solving).

Figure 4.2 Summary of 23 Measured Item

By referring to Figure 4.1 and 4.2, the zero point on the Rasch scale does not
represent zero critical thinking level. It is an artificial point representing the mean of
the item difficulties, calibrated by default to be zero, in Rasch measurement as
displayed in PIDM.
35

4.3 Person and Item Distribution Map

The person and item distribution map (PIDM) shows a better picture on how
the student correlates to the respective questions as the items and the students were
located along the proficiency scale. It can give a clearer view of the person‟s ability
and relevant item difficulty. A higher ranking indicates that the items are more
difficult and the students at the top display higher ability. Going down, the items
become easier and the students display less ability. The orthogonal arrow ( ) shows
the gap between the two items. The wider the gap, the more difficulty the students
encountered when attempting to answer the question.

By comparing each CTPS, from Figure 4.3, there are a total of nine items
designed to evaluate skills involve CTPS 1 with the item difficulties ranged from -
0.99 to 0.62. By referring to Figure 4.4, Q1_CTPS 1_1_2 was the most difficult
question while Q8_CTPS 1_1_2 was the easiest although it evaluated same skill that
is students‟ ability to identify related concepts.

Figure 4.3 Item Statistics CTPS 1

Most of the students cannot score full mark for Q1_CTPS 1_1_2 because
they cannot explain on their answers due to the lack of understanding on the question.
However, there are students who can answer all the items correctly because they
have higher logit value compared to Q1_CTPS 1_1_2. Conversely, there are students
who are unable to answer the easiest question (Q8_CTPS 1_1_2) correctly. This is
because the students cannot interpret the relationship between performance and
revision hours from the graph provided in the questionnaire.
36

Figure 4.4 Person and Item Distribution Map CTPS 1

For CTPS 2, item difficulties ranged from -0.47 to 0.59 (Figure 4.5) among
eight items. Q6_CTPS 2_3_1 (ability to re-conciliate the whole information) was
hard to answer and Q10_CTPS 2_1_1 (ability to state the „why‟) was easier to
interpret by students as illustrated in Figure 4.6.
37

Figure 4.5 Item Statistics CTPS 2

Question 6 with CTPS 2_3_1 has been categorized as the hardest item in
identifies CTPS 2. This might be due to the students cannot merge the information
between the situation given and explanation on their answer as it is to evaluate
students‟ ability to re-conciliate the whole information.

Question 7 with CTPS 2_2_1 is the easiest question among items in


measuring CTPS 2 which evaluate students‟ ability to state the „why‟. But, the result
shows there are more than half of the students who are unable to give correct
response (Figure 4.6). This is because the suggestions given by the students are less
than three, it requires students to list more suggestions based on the questions.

For CTPS 3, item difficulties ranged from -0.52 to 0.52 (Figure 4.7) among
five items. As displayed in Figure 4.8, item 2 (Q4_CTPS 3_1_2) which evaluate
students‟ ability to contrast different ideas was the easiest question while item 4
(Q5_CTPS 3_2_2) which test on students‟ ability to strategize method of solution
was difficult item.

Although question 4 with CTPS 3_1_2 is the easiest question, there are many
students who are unable to answer it correctly. Therefore, it can be concluded that
students are unable to look for alternative ideas. From question 5 with CTPS 3_2_2,
it can be concluded that most of the university students still unable to make a strategy
from a given situation.
38

Figure 4.6 Person and Item Distribution CTPS 2

Figure 4.7 Item Statistics CTPS 3


39

Figure 4.8 Person and Item Distribution CTPS 3

For CTPS 4, there is only one item being evaluated, therefore, no item
difficulties can be ranged. Therefore, more item needs to be added in order to
measure skills CTPS 4.
40

Figure 4.9 shows students with the lowest score (-2.26 logit by referring
Figure 4.1) that can be categorized as students with the poorest ability.

Figure 4.9 Person and Item Distribution Map inTotal

Similarly to PIDM, person and item histogram as in Figure 4.10 also can
estimate person and item locations on a single scale. The smaller the proportion of
correct responses, the higher the difficulty of an item hence the higher the item's
41

scale location. From the histogram, there are 4 bar charts for item means it can be
classified into four categories. Therefore, the questions can be categorized into four
groups which are difficult, moderate, easy and very easy question by deciding
through rule of thumb (Sumintono and Widhiarso, 2013). Once the item locations are
scaled, the person locations are measured on the scale.

Figure 4.9 also shows that question 6 with CTPS 2_3_1 (ability to re-
conciliate the information) is the hardest question and question 8 with CTPS 1_1_2
(ability to identify related concepts) is the easiest question in the CTPST.

Question 10 with CTPS 1_1_2 (ability to identify related concepts) has the
largest gap which indicates that students faced more difficulty when attempting to
answer the question. This implies that students are unable to draw a conclusion or
relationship from a give situation. But, question 10 with CTPS 2_3_1 (ability to re-
conciliate the information) which require students to make a conclusion based on the
graph, most of the students can answer correctly. It implies students are still able to
think critically.

From Figure 4.9 and 4.10, it can be concluded most of the students can
answer the “moderate” level questions. Also, very few numbers of students can
answer correctly for the hardest question, Q6 CTPS 2_3_1 as well as the easiest
question, Q8 CTPS 1_1_2. This might be due to the students providing the wrong
explanation in their answer and their misunderstanding on the question respectively.

Figure 4.10 Person and Item Histogram in Total


42

4.4 Misfit

Rasch analysis helps to identify the item that is not suitable to be included in
the instrument, misfit item. Total score or raw score is the score of the total number
of respondents who obtained correct for the corresponding item. The total count tells
us that 981 students responded to the items. Measure is the logit position of the item,
the biggest the value indicates more difficult the item is. As stated in Chapter 3, to
identify the misfit item, controls was applied to check item acceptability with the
0.40 < PMC < 0.85, 0.50 < OMNSQ < 1.50, and -2.0 < ZSTD < 2.0.

The validity of the question can be determined based on the analysis of the
point measure correlation. From Figure 4.11, there is small correlation (0.21) on item
Q1_CTPS 2_1_1 (ability to state the „why‟) that shows many of the students could
not answer the question and only a few students can answer the question.

Figure 4.11 Item Statistics of 23 Measured Item

The figure also shows that item Q8_CTPS 1_1_2 (ability to identify related
concepts) needs review. It seems that it meets the discrimination criteria of a quality
question with PMC = 0.34 < 0.40, OMNSQ = 1.32 < 1.5 and ZSTD = 3.4 > 2; but it
is not considered as misfit item as not all the criterias fall outside the range.
43

Figure 4.11 also presents 23 items and these were sorted in the descending
order with respect to a “Measure” column. Only one item (Q2_CTPS 2_3_2 with
PMC = 0.24, OMNSQ = 1.53 and ZSTD = 9.9) was found to have fallen outside the
acceptable regions. Further analysis on these three misfit items should be taken as
part of enhancing the instrument. Two actions might be considered such as
rephrasing or deleting the item.

Figure 4.12 shows the scalogram with 30 excellence respondents is chosen as


a reference to verify the misfit items. They were unable to answer Q2_CTPS 2_3_2
(item 3) which are categorized as “moderate” question. Perhaps all excellent students
should be able to answer these questions easily.

The scalogram also illustrates that students (person) 34, 83, 744 and 927 fails
to score question Q2_CTPS 2_3_2 is regarding ability to make judgment, even
though they are top excellence students. Hence, it is obvious that first year
undergraduate students yet to have critical thinking skills.

Based on the scalogram, top 100 students are chosen to identify the faculty‟s
performance as shown in Figure 4.13. From the figure, it implies that FKE shows the
best performance with 17 students listed in it. On the other hand, poor performance
has been shown by FGHT and RS whereby no students score within top 100. Hence,
more critical thinking problem solving assessments need to be done towards the
students. It also illustrates that most of the engineering faculties, FBME, FChE,
FPREE, FKA and FKE students is in the top 100. Individually, highest performer is
from FKA.
44

Figure 4.12 Scalogram for Top 30 Students

Figure 4.13 Performance of Top 100 Students

In conclusion, as depicted in Figure 4.9, the PMC value ranged from 0.21 to
0.57, with no item containing zero or negative values. This correlation indicated that
all items were working together in the same way in defining the critical thinking
solving problem test items. The means of the infit and outfit MNSQ of 0.99 and 1.00,
respectively, were close to the value expected by the model, 1.00. This suggests that
45

the amount of distortion of the measurement was minimal. Therefore, the CTPST is
suitable to be used in determining students‟ critical thinking level in solving
problems.

4.5 Unidimensionality

Investigation of dimensionality was carried out to ensure that the CTPST was
measuring only a single construct; the CTPS construct no other skills. Raw variance
explained by measures shows 31.1% compare to expected model, Rasch model
which is 31.0% (Figure 4.14). According to Sumintono andWidhiarso (2013), the
instrument is unidimensional with its raw variance explained by measures at least
20%. Nonetheless, in this study, the unexplained variance in 1st contrast is 8.1% but
it is still accepted as it is far from the maximum level that is 15%. It may be known
as the “noise”, the questions that influence students‟ understanding while trying to
answer.

Figure 4.15 displays that there are three items with standard residual
correlation greater or equal to 0.70. Therefore, they are considered as the “noise” in
the instrument which is item 8 (Q4_CTPS 3_3_3) which is to evaluate students‟
ability to realize the solution and put into practice by arranging nine sticks to form
five equilateral triangle, 10 (Q5_CTPS 3_2_2) which is to evaluate students‟ ability
to strategize method of solution by forming four smaller pieces of equal size and
shape from a given paper and 12 (Q6_CTPS 2_3_1) which is to evaluate students‟
ability to re-conciliate the whole information by providing correct explanation. Thus,
these items may need to be rephrasing to ensure more accurate results can be
obtained. However, it can be accepted as the “noise” is below the maximum level of
15%. Therefore, it can be concluded that the CTPST was acceptable in measuring the
critical thinking level students in solving problems.
46

Figure 4.14 Standardized Residual Variance

Figure 4.15 Identify Item with “Noise”

4.6 Summary

In the summary of 981 measured person, person separation suggested more


items are needed so that it will be easier in distinguishing between high and low
performers. For the item separation, there are 14 CTPS skills being evaluated.

The PIDM and scalogram shows that majority of the students are able to
answer the questions within moderate level. The FKE students have the highest
achievement with 17 students being listed in top 100. There were some top students
who cannot answer moderate question where the weakest student was found to have
the ability level below the minimum of item. In this study, most of the students were
unable to score full point in giving explanation.
47

In short, the overall students‟ performance in CTPST is below expectations


with the summary on the mean person, μP = -0.26 logit is lower than the value mean
item, μI = 0. By using misfit validity, there is only one misfit item due to “noise”
however, with low level of “noise” and high item reliability; it can prove that the
items are fit to the model. Rasch model unidimensionality is also determined that the
CTPST was designed in measuring students‟ critical thinking level in solving
problems with non-routine questions that capture CTPS skills and do not follow any
specific mathematical problems. All the items are designed to answer based on
logical thinking and decision making.
CHAPTER 5

DATA ANALYSIS

5.1 Introduction

In Chapter 4, from the Rasch analysis, the Critical Thinking Problem Solving
Test (CTPST) had been validated. Therefore, in this chapter, discussion on the
demographic data such as gender and faculties of the participants will be done. The
result of the data analysis from 981 respondents will also be presented. All data
collected was analyzed by using Statistical Package for the Social Sciences (SPSS)
version 16.0 for Windows. The data analyzed by using descriptive method and the
results will be shown in tables and figures.

5.2 Respondents’ Demographics

Students‟ gender, race and faculty distribution were identified as the


following:

5.2.1 Gender Distribution

Table 5.1 below presents the students‟ gender distribution. There are 441 of
male students and 540 of female respondents participated in this study.
49

Table 5.1: Gender Distribution


Gender Frequency Percentage (%)
Male 441 45
Female 540 55
Total 981 100

5.2.2 Race Distribution

Based on Figure 5.1, 71% of the respondents were Malay students, 165 of the
respondents were Chinese students followed by, 5% and 8% of respondents were
India and other races students respectively.

Race Distribution

5%
8%
Malay
16% Chinese
India
71%
Others

Figure 5.1 Race Distribution

5.2.3 Faculty Distribution

The sample size of this study is 981 of respondents whom are first year
undergraduate students from 12 faculties and schools in Universiti Teknologi
Malaysia (UTM) as illustrated in Figure 5.2 and Table 5.2. They are Faculty of Built
Environment (FAB), Faculty of Biosciences and Medical Engineering (FBME),
Faculty of Civil Engineering (FKA), Faculty of Computing (FC), Faculty of
50

Electrical Engineering (FKE), Faculty of Chemical Engineering (FChE), Faculty of


Geoinformation and Real Estate (FGHT), Faculty of Management (FM), Faculty of
Science (FS), Faculty of Petroleum and Renewable Energy Engineering (FPREE),
Razak School of Engineering and Technology (RS) and Malaysian- Japan
International Institute of Technology (MJIIT).

Table 5.2: Faculty Distribution


Faculty Frequency Faculty Frequency
FAB 58 FKE 94
FBME 124 FM 89
FC 54 FPREE 80
FChE 107 FS 93
FGHT 80 MJIIT 79
FKA 98 RS 25
TOTAL 981

By referring Table 5.2 and Figure 5.2, at least 80 students from the
engineering faculties are involved in the study. Firstly, FBME with 124 students
(13%) followed by FChE 107 students (11%). RS shown the least students took part
in the CTPST with only 25 students (3%) out of 981 students.

Faculty Distribution (%)


3% 6% FAB
8%
FBME
13%
FC
9%
FChE
5%
FGHT
FKA
8%
FKE
11% FM
9%
FPREE
FS
8% MJIIT
10% RS
10%

Figure 5.2 Faculty Distribution (%)


51

Also more than half of the selected faculties sent less than 10% of
respondents from its faculty. They are FAB, FC, FGHT, FM, FPREE, FS, MJIIT and
RS. The cooperation with faculties‟ office is important as to ensure all the first year
undergraduate students are taking part as it can help to evaluate the critical thinking
level of students.

5.3 Critical Thinking Problem Solving Level

The CTPS questionnaires were marked based on the CTPS. There are four
levels of rating scale with one as the lowest score and four as the highest score.
Students‟ answers are recorded in SPSS by calculating the mode of the CTPS rating
scale to identify the students‟ critical thinking level in solving problems.

FKE students have the highest critical thinking level in solving problems
where it has the highest number of students obtain score four. Figure 5.3 displayed a
clear view on the comparison between the frequency of the score and faculties.

Rating Scale versus Faculty


25
F
r 20
e
q 15
u Score 4
e 10 Score 3
n
Score 2
c 5
y Score 1
0

Faculty

Figure 5.3 Mode on CTPST Rating Sale


52

In contrast, FGHT has the highest numbers of students in obtaining the lowest
score, where 21 items are answered in score one and only one item was being answer
correctly by the students. In other words, FGHT shows the lowest performance in
CTPST among the faculties.

In Table 5.3, most of the students obtain score one followed by score four.
This is because some of the participants did not solve the problems which may due to
lack of time provided or they did not know the method to solve cause of students‟
understanding and logical thinking towards the questions for example problem 5 with
CTPS 3_2_2 (ability to strategize method of solution).

Moreover, some of the students provided their answer without showing


appropriate steps or detail explanation towards the questions. For example, problem
3 with CTPS 2_3_1 (ability to re-conciliate the whole information), students do not
list out the complete information from the statement given. In short, majority of the
respondents did give the explanations but were not appropriate. The students did
unpractical judgment and not making any concrete reasoning that might due to
wrong interpretation and lack of understanding towards the questions.

Table 5.3: Mode on CTPST Rating Sale


CTPST Rating Scale (Mode)
Faculty
1 2 3 4
FAB 16 0 1 6
FBME 15 2 1 5
FC 13 0 2 8
FChE 15 2 1 5
FGHT 21 1 0 1
FKA 16 1 1 5
FKE 12 2 0 9
FM 17 3 0 3
FPREE 12 2 1 8
FS 14 2 1 6
53

MJIIT 13 1 1 8
RS 17 1 0 5
TOTAL 182 19 12 73

5.4 Summary

In conclusion, measure of central tendency (mode), histogram and pie chart


were investigated in this study. It is clearly shown that engineering students did
perform well in CTPST especially students from FKE which is also agreed in
Chapter 4 through Rasch analyses.

From the results, it can be interpreted that critical thinking level of the
students‟ in solving problems needs to be enhanced during their four years of study
so that they are willing to face obstacles after graduated and applied knowledge and
skills that had been delivered by the lecturers.
CHAPTER 6

CONCLUSIONS AND RECOMMENDATIONS

6.1 Introduction

This chapter begins with the conclusion of the study that has been drawn
based on the computational experiment that was carried out. Besides the general
conclusion of the data, suggestions for future studies will also be discussed in order
to improve the work are given.

6.2 Conclusions

This study was carried out to determine the validation of CTPST and students‟
critical thinking level among first year undergraduate student in UTM. The main
conclusions of this study are as follows:

(i) The items are suitable to all first year undergraduate students as it
only involve non-routine questions that capture CTPS skills and do
not follow any specific mathematical problems with high reliability
and validity.
55

(ii) FKE have the highest achievement in CTPST. However, the overall
achievement shows that the students have low critical thinking skills
in solving problems.

(iii) The items in CTPST are unidimensional. It only measure the critical
thinking level of the respondents.

(iv) Question 6 with CTPS 2_3_1 (ability to re-conciliate the information)


is the hardest question and question 8 with CTPS 1_1_2 (ability to
identify related concepts) is the easiest question in the CTPST.

(v) More items are needed to enable researcher to identify between high
and low performers.

(vi) There is a misfit item which is Q2_CTPS 2_3_2 (ability to make


judgment)

The findings of this research suggest the need of each faculty in building
critical thinking skills of the students by making curricular changes in an attempt to
improve students‟ critical thinking skills.

6.3 Recommendations

After completing this report, these are some suggestions for future studies
that can be considered. The recommendations for further research include:

(i) More questions on developing CTPS 4 need to be added to increase


the reliability.
56

(ii) Duo language can be used to increase respondents‟ understanding


towards the questions.

(iii) The identified misfit items Q2_CTPS 2_3_2 (ability to make


judgment) can be deleted or restructure to increase the validity of the
test.

(iv) The study can be extended to other university students in Malaysia.

(v) Assessment of critical thinking and problem solving need to be done


separately.
REFERENCES

Andrich, D. (1978). A rating formulation for ordered response categories.


Psychometrika, 43, 561-573.

Aziz, A. A., Mohamed, A., Arshad, N. H., Zakaria, S., Ghulman, H. A. & Masodi, M.
S. (2008). Development of Rasch-based Descriptive Scale in profiling
Information Professionals' Competency. Proceedings of International
Symposium on Information Technology, 2008 (ITSim 2008). 26-28 Aug. 1-8.

Bloom, B. S. (1956). Taxonomy of educational objectives. Handbook I: Cognitive


domain. New York: McKay.

Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.

Bortoli, L. D. & Macaskill, G. (2014). Thinking it through: Australian students' skills


in creative problem solving (pp. 91).

Chung, H. (2005). Calibration and Validation of the Body Self-Image Questionnaire


Using the Rasch Analysis. Doctor of Philosophy, University of Georgia.

Faux, B. J. (1992). An Analysis of the Interaction of Critical Thinking, Creative


Thinking, and Intelligence with Problem Solving. DOCTOR, Temple
University Graduate Board.
58

Ghulman, H. A. & Mas'odi, M. S. (2009). Modern measurement paradigm in


Engineering Education: Easier to read and better analysis using Rasch-based
approach. Proceedings of 2009 International Conference on Engineering
Education (ICEED 2009). 7-8 December. Kuala Lumpur, Malaysia. 1-6.

Johnstone, M. N. (2006). Augmenting Postgraduate Student Problem- Solving


Ability by the Use of Critical Thinking Exercises. Proceedings of EDU-COM
2006 International Conference 22-24 November 2006. Edith Cowan
University, 245-253.

Kasim, R. S. R. & Annuar, A. (2011). Cognitive styles: Web portal acceptance items
measurement. Proceedings of 2011 IEEE International Conference on
Computer Applications and Industrial Electronics (ICCAIE). 4-7 Dec. 2011,
427-431.

Khairani, A. Z. B. & Razak, N. B. A. (2012). Advance in Educational Measurement:


A Rasch Model Analysis of Mathematics Proficiency Test. International
Journal of Social Science and Humanity, 2(3), 248-251.

Knutson, N., Akers, K. S. & Bradley, K. D. (2010). Applying the Rasch Model to
Measure First-Year Students’ Perceptions of College Academic Readiness.13.

Linacre, J. M. (2002). Optimizing rating scale category effectiveness. [Comparative


Study]. Journal of applied measurement, 3(1), 85-106.

Ministry of Education. (2013). Malaysia Education Blueprint 2013-2025.

Mourtos, N. J., Okamoto, N. D. & Rhee, J. (2004). Defining, teaching, and assessing
problem solving skills. Proceedings of 7th UICEE Annual Conference on
Engineering Education. Mumbai, India. 1-5.
59

Nopiah, Z. M., Rosli, S., Baharin, M. N., Othman, H. & Ismail, A. (2012).
Evaluation of pre-assessment method on improving student's performance in
complex analysis course. Asian Social Science, 8(16), 134-139.

OECD. (2014). PISA 2012 Results: Creative Problem Solving: Students' Skills in
Tackling Real-Life Problems (Vol. V, pp. 254): PISA, OECD.

Osman, S. A., Naam, S. I., Jaafar, O., Badaruzzaman, W. H. W. & Rahmat, R. A. A.


O. K. (2012). Application of Rasch Model in Measuring Students‟
Performance in Civil Engineering Design II Course. Procedia - Social and
Behavioral Sciences, 56(0), 59-66.

Othman, H., Asshaari, I., Bahaludin, H., Nopiah, Z. M. & Ismail, N. A. (2011).
Evaluating the Reliability and Quality of Final Exam Questions Using Rasch
Measurement Model: A Case Study of Engineering Mathematics Courses.
Kongres Pengajaran dan Pembelajaran. 163-173.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
Copenhagen: Danish Institute for Educational Research.

Saidfudin, M., Azrilah, A. A., Rodzo'An, N. A., Omar, M. Z., Zaharim, A. & Basri,
H. (2010). Easier learning outcomes analysis using Rasch model in
engineering education research. Proceedings of the 7th WSEAS international
conference on Engineering education. Corfu Island, Greece. 442-447.

Sumintono, B. & Widhiarso, W. (2013). Aplikasi Model Rasch untuk Penelitian


Ilmu-ilmu Sosial (1 ed.): TrimKom Publishing House.

What is Rasch Analysis. Retrieve from 05 May 2014, from http://www.rasch.org/


60

Williams, B., Onsman, A. & Brown, T. (2012). A Rasch and Factor Analysis of a
Paramedic Graduate Attribute Scale. Eval Health Prof 35(148), 22.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA
Press.
61

APPENDIX A
Description of Winsteps Software
1. Double click on the “Winsteps” shortcut icon on the computer desktop.

2. There will be a pop-up screen as shown below, then click on button “Import
from Excel, R, SAS and etc.”

3. Click on the green button, “Excel” file if the data saved is in Excel format.

4. Click on “Select Excel file” button.

5. Select the file and open it.

6. Next, an output as below will show. There is highlighted sentence (red


color) means error exits. Hence, editing is needed as discussed in following
step.
62

7. The variable label needs to be copy and paste under the “Item Response
Variable” and “Person Label Variable” based on the study variables.
8. Followed by, clicking on the “Construct Winsteps file” button.

9. A pop-up will be appeared; user needs to name the file and click “Save”.

10. After that, Winsteps will help to scan and format the data until the
following pop-up is shown.
63

11. Then, press “Enter” key twice. All the choices on the tab will be unlocked.
Therefore, researcher can click on the “Output Table” to obtain the selected
analysis.

12. For example, choose “3.1 Summary Statistics”.

13. The output will be displayed in notepad format. Researcher can save it and
do further interpretation on the gained results.

14. Besides, user can also click on “Graph” if you prefer to look on the curve
or person and item histogram instead of the PIDM.

You might also like