You are on page 1of 12

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number

Exercise Generation and Student Cognitive


Ability Research Based on ChatGPT and Rasch
Model
Han Xue1, Yanmin Niu2*
College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;

Corresponding author: Yanmin Niu (niuym@cqnu.edu.cn).


This work was supported in part by the Chongqing Normal University under Grant (20XLB035).

ABSTRACT In the context of generative artificial intelligence (AI), AIGCP (content generation-based AI
products), represented by ChatGPT, have attracted extensive attention in the field of education. This study
focuses on the discipline of university operating systems and adopts the Rasch model as the theoretical
foundation. By combining ChatGPT with existing question banks and using the bidirectional fine-grained
table method, it compiles questions that match the corresponding abilities for three different levels of
student groups. This aims to explore personalized question matching and student cognitive ability analysis
methods to support personalized teaching. The research findings indicate that ChatGPT is capable of
matching exercises of similar difficulty under the Rasch model, but its accuracy in generating exercise
content is relatively low, and the variety of exercise content is limited. Students' performance in overall
competency requires improvement. This study aims to leverage the combined strengths of ChatGPT and
traditional educational assessment methods to introduce an innovative approach to support personalized
instruction. It aims to establish the routine utilization of exercise creation by ChatGPT and personalized
analysis of student cognitive abilities, thereby better fulfilling the demands of education within the
classroom setting.

INDEX TERMS Generative artificial intelligence, Rasch model, Personalized question matching,
Cognitive ability, Operating system exercises

I. INTRODUCTION creation, student feedback generation, code generation,


At present, exercise design plays a crucial role in the field text and image generation, and more[2]. The emergence of
of education, and its quality directly impacts teachers' ChatGPT offers new possibilities for exercise design and
effective assessment of students' cognitive abilities. uncovering students' cognitive abilities.
Traditional exercise design typically relies on teachers' Although research has been conducted on exercise
experience and existing exercise resources such as design and the exploration of students' cognitive abilities,
Yuantiku, Wantioku, and Daxue Souti. However, while several challenges remain:
these resources are rich, they cannot meet the demand for Exercise design relies on teachers' experience,
personalized exercise design and lack integration of leading to time-consuming and inefficient processes.
interdisciplinary exercises. Furthermore, teachers spend a The Rasch measurement theory can assess exercise
significant amount of time designing exercises, limiting accuracy and student abilities but lacks methods for
their ability to provide personalized guidance to students personalized exploration of students' cognitive abilities.
and uncover their cognitive abilities. However, with the Leveraging AI technologies like ChatGPT opens up
rapid development of artificial intelligence (AI) new possibilities for education, but the integration of these
technology, a plethora of AI-driven cognitive platforms technologies to improve exercise design and the
(AIGCP) has emerged. For example, AIGCPs like exploration of students' cognitive abilities needs further
ChatGPT have the potential to have a significant impact exploration.
on education[1], including exercise design, presentation

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

The main motivation of this research is to address the language and literature). However, it also faces challenges
above-mentioned challenges in exercise design and such as a lack of common sense, potential biases,
uncovering students' cognitive abilities in the field of difficulties in complex reasoning, and the inability to
education. The aim is to develop methods for personalized handle visual information. It is important to bear these
exercise matching and student cognitive ability limitations in mind when using ChatGPT and avoid
exploration. Therefore, this study adopts the Rasch model blindly relying on these technologies. Immediate efforts
as a theoretical foundation and combines ChatGPT's should be made to address these disadvantages to ensure
exercise design capabilities to create customized exercise fairness for all [4].Taking the teaching of quadratic
tests for different levels of student groups, using the equations in junior high school as an example, You Jiajing
subject of university operating systems as an example. and others analyzed the opportunities and challenges of
The main contributions of this research are twofold: applying ChatGPT in the process of learning junior high
By combining ChatGPT and the Rasch model, it is school mathematics. They concluded that the ChatGPT
possible to match exercises corresponding to the ability model has the potential to assist students in learning junior
levels of different students, improving the efficiency of high school mathematics, especially when teachers are
teacher exercise design and achieving the goal of involved. However, in the absence of teacher guidance,
personalized exercise matching. most students still struggle to directly engage in
Based on designed exercises and a bidirectional item mathematical learning through interaction with ChatGPT.
map approach, it is possible to effectively explore the As AI technology continues to advance, it can provide a
cognitive abilities of students at different levels, laying a new approach to promoting educational equity [5].
solid foundation for personalized teaching. The Rasch model has been widely employed in
The structure of this research is as follows: first, a various fields, including education and psychology, and
review of relevant research literature related to exercise has matured as a research tool. It has demonstrated
design and the exploration of students' cognitive abilities considerable utility in the context of exercise testing. For
is provided to support the theoretical and practical aspects instance, Amir Mohamed Talib and colleagues utilized the
of this research. Second, the basic principles of the Rasch Rasch model to assess the performance of students in the
measurement theory are introduced. Then, the research Information Technology Foundation (IT280) course's final
process and experimental results of this study are detailed, exam at an Arab Muslim University. Through a study
explaining how the Rasch model and ChatGPT are involving 150 second-year students, they measured
combined and how personalized exercise design and students' knowledge and understanding of the IT280
student cognitive ability exploration are achieved through course based on Bloom's cognitive objectives across three
the bidirectional item map approach. Finally, the research levels. The research findings categorized students into
results are discussed, major findings are summarized, and four levels: fail (10%), average (42%), good (18%), and
potential future research directions are explored.. excellent (24%), assessing the extent to which students
achieved cognitive objectives at the three levels. The
II. RELATED WORK study revealed that students performed relatively well in
Recently, there has been widespread attention to research the IT280 final exam. These research results can guide the
on AI-based generative conversational models like development of appropriate teaching methods and
ChatGPT. A study conducted by Abdulhadi Shoufan improvements in the quality of exam items [6].Wufati and
explored students' perceptions of ChatGPT, with students colleagues employed the Rasch model to construct
expressing appreciation for its conversational chat monthly exam papers in the field of political science,
functionality, finding it interesting, inspiring, and helpful collecting response data from 195 students. They explored
for learning and work. However, many students also the effectiveness of the exam papers, analyzed each
recognize that using ChatGPT requires a good background student's knowledge mastery underlying their scores, and
knowledge as it cannot replace human intelligence. assessed students' mastery of various cognitive levels.
ChatGPT can and should be used for learning purposes, Through data analysis, they generated radar charts for
but students need to be aware of its limitations. Educators visual output, facilitating the normalized use of student
should try using ChatGPT and provide students with cognitive analysis in a blended classroom setting [7].In
appropriate techniques for using it effectively, as well as another study, Liu Yixuan and collaborators focused on
helping them evaluate the generated responses [3].From the maintenance of a question bank for assessing
the perspectives of both students and educators, Md. academic abilities in a particular subject. They utilized
Mostafifizer Rahman and others discussed the potential Rasch model-based linking evaluation techniques and
opportunities and challenges of ChatGPT in education. question bank maintenance solutions. After designing
The ChatGPT model can be applied to solve technical linkages, they reorganized the question bank and
problems (such as engineering and computer evaluated the quality of the linkages by analyzing the
programming) as well as non-technical problems (such as dimensions of assessed abilities, difficulty indicators,

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

functional differences, and more between the new and old In the design of exercises, the Rasch model is widely
question banks. Ultimately, 18 questions that met the employed. The Rasch model, proposed by the Danish
testing criteria were incorporated into the question bank. mathematician Georg Rasch in 1960, is a latent trait
The Rasch model-based question bank maintenance model. This model, rooted in the realm of natural sciences
approach offered a systematic set of procedures and for objective measurement, establishes a set of objective
evaluation criteria, potentially aiding in the construction standards for measurement within the field of social
of subject-specific question banks in the future [8].As sciences, ensuring a more objective and reliable
research into computer-adaptive testing continues to information provided by measurements [11].
advance, exercise design capabilities have greatly Among the various statistical models constructed
improved. Yang Zhiming and colleagues conducted a based on Item Response Theory (IRT), the Rasch model
systematic analysis of the basic concepts, pattern design, stands out as one extensively applied model. In
eight major assumptions, and eight major advantages of comparison to classical measurement theory, Item
computer-adaptive multi-stage testing. They particularly Response Theory overcomes several challenging issues,
explored scoring methods based on raw scores and such as excessive dependence on samples [12]. Within the
response patterns. Their analysis holds significant Rasch model, the assessment of item parameters is
reference value for the development of high-level independent of the group of participants, and its key
computerized adaptive multi-stage assessment systems feature lies in its objectivity. However, this objectivity
and platforms [9].In another study, Liu Kai and fellow holds true only when the data conforms to the Rasch
researchers administered a series of questions related to model. The model attributes individual differences in
eating disorders to 1025 Chinese undergraduate ability to their problem-solving capabilities rather than
respondents in a paper-and-pencil test. They constructed a being influenced by the set of problems solved. In other
CAT-ED item bank using 133 items from four validated words, individuals with higher abilities should always
Chinese eating disorder scales and conducted relevant have a greater probability of solving any problem related
analyses. The effectiveness and rationality of CAT-ED to that ability [13]. Specifically, when the difficulty of an
were examined by calculating validity, sensitivity, and item matches the student's ability, the probability of the
specificity correlations with standards. The final item student answering it correctly is 50%. As the difficulty of
bank comprised 77 items that met the testing requirements the item falls below the student's ability, the probability of
[10].The advent of computer-adaptive testing has the student answering it correctly surpasses 50%, with a
improved the accuracy and relevance of exercise design. greater disparity resulting in a higher probability.
However, there is still room for improvement in terms of Similarly, when the item's difficulty exceeds the student's
design efficiency, and further research is needed to refine ability, the probability of the student answering it
the parameters used in these enhancements. correctly drops below 50%, with a greater disparity
Most of the mentioned studies focused on specific leading to a lower probability.
aspects of exploration, with limited validation of In the Rasch model, the evaluation of item
ChatGPT's role in exercise design. In contrast, prior parameters is independent of the group of subjects,
research did not utilize ChatGPT for exercise generation, making it particularly valuable for the construction of
resulting in slower exercise design efficiency. Nonetheless, large-scale exercise banks.In the Rasch model, the
ChatGPT demonstrates substantial potential in exercise dependent variable represents the binary response of a
design. While the Rasch model has been predominantly specific individual to a particular item (e.g., correct or
used to assess exercise effectiveness, it has not been fully incorrect, yes or no), while the independent variable is the
utilized to set standards for exercise difficulty and student difference between the individual's trait score θs and the
ability, nor has it considered the significant advantages item difficulty βi. The Rasch model has two versions,
that can arise from combining both approaches. Therefore, namely the logarithmic form and the probability form, for
this study is based on the synergy between ChatGPT and representing the independent variable. In the logarithmic
the Rasch model, employing the bidirectional item map version of the Rasch model, the dependent variable
approach. Using the example of a university operating represents the ratio of the individual's probability Pis of
systems course, it aims to design different exercise tests success on item i to the probability of failure, as shown in
for students at various levels and explore methods for equation (1). In the second version, the dependent variable
personalized exercise matching and the assessment of represents the individual's simple probability of success
student cognitive abilities based on test data. This research on item i P(X is=1), as shown in equation (2). In this
seeks to facilitate personalized and intelligent teaching for equation, θs and βi denote the definitions of the individual
educators and individualized learning for students in the parameters and item difficulty, respectively, while exp(θs-
context of generative artificial intelligence. βi) refers to the natural exponentiation of the difference
between the individual and item parameters [14].
III. RASCH MEASUREMENT THEORY

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

personalized and intelligent development of teaching for


teachers and learning for students in the context of
Ln[Pis /(1 - Pis )]   s -  i (1) generative artificial intelligence. However, it is important
to note that in the application of generative artificial
intelligence products and the Rasch model, ethical issues
must be carefully addressed, and excessive reliance on
technology and misuse should be avoided to ensure the
exp( s   i )
P ( X is  1 /  s '  i )  (2) safe use of artificial intelligence in education.
1  exp( s   i )
IV. METHODS AND MATERIALS
However, the Rasch model exists in an idealized state.
It assumes that students' performance on items is solely A. Experimental Design
influenced by their knowledge. Yet, in practical teaching In previous research, the emphasis on exercise design has
scenarios, students' response patterns are influenced by a often been concentrated in fields such as essay writing,
multitude of factors. Apart from individual knowledge translation, mathematics (inductive), and language arts
mastery, factors such as mindset, environment, physical (reading) [15]. However, exercise design in the field of
condition, and more can also impact students' responses. computer science has been relatively rare, and even in
For instance, some students may possess the requisite cases involving computer-related exercises, it has often
knowledge for a given item but still provide incorrect been confined to assessing foundational computer
answers due to careless reading or other factors. Given knowledge. Nevertheless, knowledge of operating systems,
that the Rasch model is a theoretical construct, its as a mandatory course for computer science students,
application necessitates a process of fit analysis, with the encompasses topics like system overview, process
fit indices needing to meet certain criteria before management, CPU scheduling, memory management, and
utilization can proceed, as detailed in Section 5.2. file management, constituting a crucial subject for
While the Rasch model serves as a valuable tool for evaluating students' comprehensive abilities.To effectively
measuring the abilities of students and the difficulty of tap into students' cognitive abilities and demonstrate the
items, facilitating the analysis of their relationship, and universality of the research, this study employs the
offering reference standards for item difficulty to enhance context of university-level operating systems courses. It
the precision of item design, it does not necessarily lead to collects student answer data from the first five chapters
efficient question generation by educators. Despite its through the Chongqing Higher Education Smart Education
accuracy-enhancing properties, educators still rely on Platform.
experiential-based approaches for item creation, a practice In order to achieve personalized objectives, it is
that ensures accuracy but does not optimize time necessary to classify students into different levels to
utilization. Prior research has primarily focused on the facilitate exercise design and cognitive ability exploration.
accuracy of item design, employing the Rasch model to In traditional exercise design, the concept of stratification
evaluate the precision of designed items, yet it has not was not considered, and the focus was solely on whether
effectively addressed the time-consuming nature of item the exercises were suitable. With the assistance of the
development.However, the advent of generative artificial Rasch model, it is possible to stratify exercises and
intelligence products, such as ChatGPT, presents an student abilities. However, previous research did not
opportunity to swiftly generate items based on the scope establish clear criteria for this stratification. Given the
of knowledge provided by educators, thereby significantly large quantity of personalized exercise items, relying
boosting the efficiency of item creation without solely on manual creation would decrease efficiency.
compromising precision. Integrating AI-Guided Content Using ChatGPT can significantly enhance the efficiency
Production (AIGCP) with the Rasch model holds the of exercise generation. The detailed steps are as follows:
potential to amplify the accuracy and efficiency of Initially, utilizing Ministep 5.0.0 software and Whit's
educator-led item design, enabling a more profound model in conjunction with answer data, students are
exploration of students' cognitive abilities and advancing categorized into three levels. Regarding exercise difficulty
the evolution of intelligent pedagogy. levels, on average, -1.0 represents easy exercises, 0.0
This study is based on the integration of AIGCP and signifies moderate difficulty, and 1.0 represents high
the Rasch model, using the bidirectional fine-grained table difficulty [16]. Based on the exercise difficulty standards
method. Taking the subject of operating systems in and student answer data, students are classified as A
college as an example, tailored exercise tests are designed (higher-performing learners), B (average-performing
for students at different levels. Based on the test data, learners), and C (lower-performing learners), with the
methods for personalized exercise matching and students' student ability ranking as A > B > C.
cognitive abilities are explored, supporting the

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

Subsequently, to align exercises with students' the operating systems course and have acquired certain
abilities, the original exercise difficulties must be competency requirements. Following their study of
categorized, creating exercise samples. In this study, Chapter Eight, which covers the topic of file management,
focusing on the knowledge of file management in the the students are immediately subjected to a test with the
eighth chapter of the operating systems course, the expectation of achieving favorable outcomes.
Ministep software is employed to categorize the original From the answer data collected in the previous five
exercises into low, medium, and high difficulty levels chapters, complete answer data is obtained from 64
based on historical exercise data and difficulty standards. students, while the remaining 59 students have incomplete
Next, to explore students' cognitive abilities, the answer data. Based on their learning performance and a
bidirectional fine-grained table method is commonly used. comparison with the initial group of stratified students, the
Employing this method, educators generate exercises of educator will perform further stratification. Utilizing the
equivalent difficulty to the existing samples. Ministep software and Whit's model, the students are
Collaborating with ChatGPT, educators generate exercises divided into three strata. The Whit's model transforms
that match the predetermined difficulty and knowledge item difficulty and student ability into the logit scale,
scope. Tailored exercises are formulated according to ensuring that item difficulty and student ability are
students' distinct levels (A, B, C). situated on the same level plane [17]. In the Whit's model
Finally, to evaluate exercise effectiveness and representation, the difficulty values of exercises are
uncover students' cognitive abilities, this study utilizes the displayed on the left side, denoted by an 'X' for each
Rasch model as a foundation for analyzing exercise exercise, while the range of student abilities is shown on
validity, rigor, and students' cognitive abilities. Through the right side, sequentially indicating student IDs, gender,
this sequence of steps, the study delves into methods for and class. Higher-ability students are positioned higher,
personalized exercise matching and student cognitive with overall student abilities ranging from -0.7 to 5 logit.
ability exploration. The detailed steps of this experimental In accordance with the categorization criteria
design are depicted in the following figure, as illustrated mentioned in Section 4.1 (with an average gap of 1.0 logit
in Figure 1. The detailed steps of the experimental design between abilities and difficulties), as well as the actual
are shown in Figure 1. answer data, this study classifies students with abilities

FIGURE 1. Experimental Design Process


ranging between 4.6 to 5 logit, 2.2 to 3.8 logit, and -0.7 to
1.8 logit into the corresponding categories of ABC, as
B. Experimental participants depicted in detail in Figure 2.
The experimental subjects of this study consist of 123
freshmen majoring in Computer Science and Technology
at a university in Chongqing, China. Among them, 58 are
male and 65 are female, resulting in a fairly balanced
gender distribution. These students are enrolled in an
operating systems course for the duration of one academic
semester. After nearly one semester of study, the students
have become fairly acquainted with the subject matter of

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

lack medium and high difficulty questions, so special


attention will be given to their inclusion in the subsequent
design process. The design of the exercises will refer to
the original question difficulty as a reference point.
Before the exercise design process, a bidirectional
fine-grained table will be created based on the relationship
between test knowledge and abilities. This table will serve
as a basis for the rational design of exercises. A
bidirectional fine-grained table is a cross-tabulation of
knowledge dimensions and skill dimensions [18]. The
ability levels referenced in the bidirectional fine-grained
table are generally based on the cognitive objective
classification proposed by American educator Benjamin
Bloom, which includes levels such as remembering,
understanding, applying, analyzing, synthesizing, and
evaluating [19]. For the knowledge of operating system
file management, a corresponding bidirectional fine-
grained table has been developed, as outlined in Table 1.

FIGURE 2. Personnel Level Division

C. Test Design
This study designs three sets of exercises tailored to the
abilities of students in the ABC strata. Utilizing the
existing 9 test questions as a sample, new exercises are
devised based on the difficulty coefficients of these 9
questions. As illustrated in Figure 3, on the right side of
the Whit's model, exercise difficulty is represented, with
higher-difficulty exercises placed at higher positions. The
overall difficulty coefficients of the exercises range from -
2.3 to 4.1 logit. Specifically, difficulty values between -
2.3 to -0.7 logit, 0.9 to 2.8 logit, and 4.1 logit are
categorized as low, medium, and high difficulty levels, FIGURE 3. Problem difficulty division
TABLE 1
respectively. On the left side, the distribution of BIDIRECTIONAL BREAKDOWN OF FILE MANAGEMENT KNOWLEDGE
participants' answers is depicted, with each '#' symbol
representing 3 individuals and each '.' symbol representing
2 individuals. It should be noted that the original exercises

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

Exercise Full Cognitive


Knowledge Points(ABC) Question format difficulty level question type
Number Score Ability
1 5 Record, file type, path Understanding combining low Multiple Choice
File time, directory
2 5 Understanding combining low Multiple Choice
structure, index files
Catalog file, file system,
3 5 Understanding teacher low Multiple Choice
Filename extension
File attributes, file
4 5 operations, fixed length Understanding combining low Multiple Choice
records
File operation, logical application
5 5 teacher medium Multiple Choice
structure, file path analysis
Index files, file attributes, application
6 5 teacher medium Multiple Choice
file hierarchy analysis
Index files, file
application
7 5 organization, and structural combining medium Multiple Choice
analysis
files
Directory retrieval, logical application
8 5 teacher medium Multiple Choice
files, file configuration analysis
Index file, file
9 5 comprehensive teacher high Multiple Choice
configuration, file location
Logical structure, physical
10 5 comprehensive teacher high Multiple Choice
structure, file
Directory retrieval, process,
11 5 comprehensive teacher high Multiple Choice
directory retrieval
12 5 indexed sequential file comprehensive combining high Multiple Choice
dimension are all less than 3. This indicates that all three
V. RESEARCH RESULT sets of exercises conform to the unidimensionality
A. Unidimensionality Hypothesis and Analysis assumption, effectively assessing the corresponding
The success of the unidimensionality hypothesis is one of knowledge within the valid range. The design of the
the conditions for using the Rasch model, which suggests exercise scope is reasonable, allowing for Rasch model
that the examinees' performance in the test is primarily analysis and providing targeted instructional and learning
influenced by a single dominant trait. In this study, the guidance.
confirmation of the unidimensionality hypothesis implies TABLE 2
that the test items measured students' ability in the area of UNIDIMENSIONAL ASSUMPTION COEFFICIENTS
computer system file management knowledge and were exercises A B C
within the scope of knowledge being assessed. If the Raw variance explained by
20.2% 40.5% 43.5%
unidimensionality hypothesis is not supported, further measures Observed
investigation and modification of the test items are Unexplned
required. Typically, residual principal component analysis 2.1700 2.8088 2.3952
variance in 1st contrast Eigenvalue
can be used to determine the unidimensionality of the test TABLE 3
items [20]. This approach examines the residuals (i.e., the FIT TEST RESULTS
differences between observed and predicted scores) to
assess whether the test items conform to the
unidimensionality assumption. To ensure the
unidimensionality of the data, two criteria must be met: (a)
the percentage of variance explained by the primary
dimension should be at least 20%, and (b) the eigenvalue
of the first secondary dimension should be less than 3 [21].
As shown in Table 2, the percentage of variance explained
by the primary dimension for all three sets of exercises
exceeds 20%, with sets B and C exceeding 40%.
Additionally, the eigenvalues of the first secondary

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

A B C
Infit outfit Infit outfit Infit outfit
Mnsq zstd Mnsq zstd Mnsq zstd Mnsq zstd Mnsq zstd Mnsq zstd
1 1.07 .75 1.04 .30 1.05 .26 .75 -.23 .94 .04 .64 -.16
MINIMUM MINIMUM
2 .75 -.27 .22 -.68 1.02 .24 .67 .00
MEASURE MEASURE
3 .73 -.76 .46 -1.27 .87 -.46 .67 -.74 1.01 .33 .67 .21
4 1.09 .90 1.10 .75 .81 -.29 .43 -.55 1.16 .97 1.33 1.39
5 1.17 .65 1.59 1.33 .87 -.15 1.20 .51 1.11 .55 1.49 1.07
6 1.01 .09 .96 -.27 .97 .14 .89 .27 .97 -.01 1.40 1.03
7 1.10 .80 1.07 .46 .98 -.11 .92 -.32 .90 -.59 .79 -.54
8 .95 -.19 .88 -.33 .96 -.08 .77 -.45 .89 -.92 .79 -.96
9 1.03 .21 .94 -.12 1.11 .48 4.00 3.35 .86 -.05 1.19 .53
10 .82 -1.15 .71 -1.29 1.04 .22 .98 .14 1.03 .22 1.00 .12
11 1.02 .22 1.01 .09 .79 -1.20 .98 .01 1.01 .15 1.09 .42
12 1.00 .04 .99 -.04 1.30 2.07 1.46 2.10 .90 -.75 .84 -.76
student, and on the right side, the numbers indicate the
B. Goodness of Fit Test exercise sequence.By observing the three ICCs, it can be
The goodness of fit is an indicator used to evaluate the noted that the average abilities of students are higher than
degree of match between the data and the model, the average difficulty values of the exercises (0.6 logit > -
including unweighted fit and weighted fit. Typically, a 0.5 logit, 1.7 logit > 0 logit, 0.5 logit > 0 logit). Overall,
range of 0.5 to 1.5 is considered as a reasonable fit the students' performance is higher than expected,
between the data and the model [22]. The results of the indicating that the overall difficulty of the exercises needs
goodness of fit test for the three sets of items are to be further strengthened.
presented in Table 3. From the A ICC, it can be observed that the students'
In the A test items, the Infit Mnsq and outfit Mnsq ability distribution ranges from -2 logit to -3 logit, while
values for item 2 are both MINIMUM MEASURE, the difficulty values of the exercises range from -2 logit to
indicating that this test item is invalid. By examining the -1.9 logit. The difficulty of the exercises is lower than the
raw data, it was found that the correct rate for this item students' ability by 1.1 logit. Both the students' abilities
was 100%, suggesting that it was too easy and failed to and exercise difficulties follow a normal distribution,
produce the expected effects. Therefore, this item should indicating a reasonably balanced distribution. However,
be removed. Additionally, the fit parameters for item 5 in there is a lack of highly difficult exercises, so it is
the A test and items 2, 4, and 9 in the B test do not fall necessary to develop exercises with higher difficulty in
within the range of the model, indicating the need for future improvements.
further modifications to enhance the alignment with the In the B ICC, the range of students' abilities is
model. between -0.6 logit and -4 logit, while the difficulty range
Upon examining the test items and the response of the exercises is between -1.8 logit and 3.6 logit. The
patterns, it was identified that item 5 in the A test requires difficulty values of the first, second, fourth, and sixth
detailed analysis and focuses on understanding the exercises are lower than the minimum ability of students
question requirements. If students fail to grasp the by 1.2 logit. Therefore, the lower difficulty exercises
fundamental meaning of the question, they are likely to should be upgraded to match the abilities of students in
select the wrong answer. For the B test, items 2 and 4, the range of 1 logit to 2.3 logit.
which are basic questions, were answered incorrectly by a From the C ICC, it can be seen that the range of
minority of students, mainly due to carelessness. Item 9 students' abilities is between -1.1 logit and -3.8 logit,
had a high number of incorrect responses, making it while the difficulty range of the exercises is between -3.2
unable to differentiate students with different abilities and logit and -4.1 logit. The exercises show a vertical
failing to achieve the desired effect in assessing students' distribution, with a scarcity of exercises in the medium
abilities. It is recommended to remove this item. difficulty level. Therefore, the difficulty of the first,
C. Item and Ability Analysis
second, and third exercises should be upgraded to around
This study uses item characteristic curves (ICC), also
0.5 logit.
known as Wright maps, to analyze the generated exercises
Overall, the development of the exercises roughly
and students' answering abilities. A detailed description of
aligns with the students' abilities, but some exercises
the ICC can be found in section 4.2. As shown in Figure 4,
require further development, taking into account the
from left to right, we have ICCs for three types of
difficulty of existing exercises. Among all the exercises
exercises: A, B, and C. On the left side, 'X' represents a

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

that need to be redeveloped, there are exercises created by Additionally, students of different abilities have
teachers as well as exercises created collaboratively by variations in their knowledge mastery, which will be
both teachers and ChatGPT. However, the proportion of analyzed in section 4.4 regarding students' cognitive
collaboratively created exercises is slightly higher. abilities. Based on the exercise analysis results, this study
Overall, ChatGPT needs to improve the difficulty level will preliminarily utilize ChatGPT to upgrade the
during exercise development. When creating exercises difficulty level of the mentioned questions 1 and 2 in set B.
with low difficulty, the actual requirements need to be As shown in Table 4, it can be seen that ChatGPT mainly
considered, whether it is to ensure that all students master made changes to the options of the exercises, adding

FIGURE 4. Figure 4. the hierarchical Wright maps for exercises A, B,


distractors to increase the difficulty. It is worth noting that
and C this is only based on the answers provided by ChatGPT,
and when redeveloping the exercises, a collaborative
the content to enhance their confidence or to challenge the approach with teachers is still required for design.
abilities of most students. TABLE 4
DIFFICULTY UPGRADE
In the three sets of exercises, there are three identical
questions. In set A, they are questions 8, 10, and 12; in set number Original question upgrade question
B, questions 8, 9, and 12; and in set C, questions 9, 11, Which of the following is a Which of the following is a
and 12. Taking question 12 as an example, it was initially correct pathname in an correct pathname in an
classified as the most difficult level during development operating system? operating system?
but in actuality, it is not the most difficult question. Its 1 A /home/user/file.txt A /home/user/file.txt
difficulty level is mostly in the upper medium range. This B /home/user/./file.txt B /home/user/./file.txt
indicates a deviation in the grading of exercise difficulty. C /home/user//file.txt C /home/user//file.txt .
It would be better to arrange the exercise difficulty in D /home/user/*.txt D \home\user\file.txt
ascending order, from easy to difficult, and to reorganize
the order of the exercises.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

Which of the following is Which of the following is the Although the difficulty levels differ among the three
the main advantage of an main advantage of an index sets of exercises, it can be observed that students under set
index file? file? A perform well in all aspects and outperform other
A. Fast data search and A. Fast data search and
students. Students at all levels demonstrate good
performance in understanding and applied analysis
retrieval retrieval
abilities, while those at levels B and C show poor
2 B. Reduced storage space B. Reduced storage space
performance in synthesis abilities, requiring focused
requirements requirements
improvement in synthesis skills. Considering the varying
C. Prevention of data C. Prevention of data difficulty levels of the three sets of exercises, students'
inconsistency inconsistency ability values fall within the same range, and they have
D. Improved system D. Support for concurrent reached the standards for their respective self-levels.
stability operations However, there are still differences in abilities among
3 In Windows operating In Windows operating different levels.
systems, which file systems, which file extension In the subsequent teaching process, teachers should
extension is commonly is commonly associated with emphasize the cultivation of students' synthesis abilities.
associated with executable executable programs? Once students achieve their self-level competency goals,
programs? A.iso
consideration should be given to fostering their
progression into the next level and continuously
A.bat B.xlsx
enhancing their abilities. Helping students develop
B.Docx C.exe
comprehensively in all cognitive abilities and gradually
C.exe D.mp3
improving their overall competency level.
D.txt
4 Which of the following Which of the following
options most accurately options most accurately
defines keywords? (Low) defines keywords? (Low)
A. All data in a record A. All data items in a record
B. All data in a database B. All data in a database
C. Uniquely identify a C. Uniquely identify a record
record data item data item
D. User interface of D. User interface of database
database management management system
system
D. Analysis of Students' Cognitive Abilities
Based on the development of the two-way item
matrix and Bloom's cognitive objective classification, the
three sets of exercises mainly assess students' abilities in
understanding, applying analysis, and synthesis. By FIGURE 5. Ability of student 19 under exercise A
observing the Wright maps for the three sets of exercises,
three students at an intermediate level were selected for
cognitive ability analysis: student 19 for set A, student 12
for set B, and student 18 for set C. Ability profile graphs
were created for these three students, as shown in Figure 5,
6, and 7.
For student 19 under set A, their abilities exceed the
average level in all aspects, but there is still room for
improvement in synthesis and applied analysis skills.
Student 12 under set B demonstrates above-average
understanding and applied analysis abilities but falls short
in synthesis skills, requiring specific attention to
enhancing synthesis capabilities. Student 18 under set C
exhibits above-average understanding and applied analysis
abilities, with notable proficiency in applied analysis.
However, their synthesis abilities are significantly below
FIGURE 6. Ability of student 12 under exercise B
average, demanding special attention and reinforcement.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

with the same 2.0 logit ability. Teachers must customize


their instruction based on individual circumstances to help
students achieve their set goals. This comprehensive
approach ensures more effective exercise design and a
better understanding of students' cognitive abilities,
enabling teachers to offer more targeted teaching
strategies.
However, it is crucial to note that AIGCP also
presents limitations. According to Grant Cooper's research,
the current state of ChatGPT poses a risk of positioning
itself as the ultimate cognitive authority for judgment, and
there are significant ethical concerns related to AI,
including potential environmental impacts, content
management issues, and copyright infringement risks [24].
Exercises generated solely by ChatGPT may lack
credibility and might not align with the specified cognitive
levels. Therefore, teachers must collaborate with
FIGURE 7. Ability of student 18 under exercise C knowledgeable professionals to ensure the quality and
accuracy of exercise design, particularly when striving for
VI. DISCUSSION
personalized exercise matching. Additionally, while the
In the context of generative artificial intelligence,
combination of ChatGPT and the Rasch model evaluates
leveraging AIGCP enhances the efficiency of teachers in
student cognitive abilities, professional teachers are still
exercise design. Simultaneously, through the integration
essential for judgment and guidance.
of AIGCP and the Rasch model, combined with the
In summary, the combination of ChatGPT and the
bidirectional fine-grained table approach, students'
Rasch model, along with the use of the bidirectional item
cognitive abilities can be effectively unearthed.
map approach, can enhance the efficiency of teacher-led
The cognitive taxonomy theory proposed by
exercise design and uncover students' cognitive abilities.
American psychologist Benjamin Bloom offers a
Compared to the previous approach that solely relied on
beneficial framework for this study, aiding in
the Rasch model to assess exercise suitability and didn't
understanding the different levels of the learning process.
consider exercise design efficiency, the integration of
The cognitive taxonomy encompasses six levels:
these two methods provides a new possibility for exercise
remembering (acquiring information), understanding
design and cognitive ability exploration. However, in
(interpreting information), applying (problem-solving),
practical application, teachers should avoid excessive
analyzing (comparing information), synthesizing
reliance on AIGCPs and continue to rely on their own
(developing new understanding), and evaluating
expertise to achieve personalized teaching.
(assessing information) [23]. Findings suggest that when
teachers employ AIGCP for exercise design, they can VII. CONCLUSION
design exercises corresponding to different student levels The purpose of this study was to create personalized
based on Bloom's cognitive taxonomy in conjunction with exercise papers that match the abilities of students at three
learning objectives and student proficiency.By combining different levels, in order to explore methods for
AIGCP with the Rasch model, personalized exercise personalized exercise matching and student cognitive
matching can be performed, and a more precise abilities, providing support for personalized teaching.
assessment of students' cognitive levels can be achieved. By combining teachers and AIGCP and utilizing the
Once teachers ascertain students' cognitive abilities, they Rasch model, suitable exercises can be created for
can tailor their teaching methods accordingly. For students at each level, enabling a reasonable assessment of
example, a student with an ability of 1.0 logit could be their abilities. However, exercises generated solely by
guided by the teacher to reach a 2.0 logit ability. The relying on AIGCP cannot be fully trusted, as they perform
teacher can then provide exercises at the 2.0 logit level, poorly in terms of exercise accuracy and matching
analyze the areas where the student needs improvement to difficulty. Therefore, exercise creation still needs to
reach the target, and offer guidance. This approach aligns involve the assistance of teachers to achieve better results.
with Vygotsky's Zone of Proximal Development theory. Although AIGCP provides methods to enhance
Importantly, due to the personalized nature of exercise efficiency and personalize exercise matching in the
matching, students with the same 2.0 logit ability may process of teacher-led exercise creation, there are also
differ in cognitive levels. A student at Level A with a 2.0 risks and challenges. In the field of education, it is
logit ability may outperform a student at Level B or C essential to use these technologies sensibly, harness their

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325741

Author Name: Preparation of Papers for IEEE Access (February 2017)

potential to facilitate teaching, and combine them with [13] Wallace, C. S., & Bailey, J. M. (2010). Do concept inventories act
ually measure anything. Astronomy Education Review, 9(1), 010
teachers' expertise to improve students' overall abilities 116.
and teaching quality. [14] Cai Minjun. Fundamentals of Rasch Model Education Measureme
However, this study also has certain limitations. The nt Application [M]. Beijing: China Science and Technology Press,
2021:10-11.
sample size and the number of exercises generated are [15] Goodwin, S. (2016). A Many-Facet Rasch analysis comparing ess
relatively small. The proposed combined approach needs ay rater behavior on an academic English reading/writing test use
further expansion into other domains of exercise design to d for two purposes. Assessing Writing, 30, 21-31.
facilitate more in-depth and comprehensive research. [16] Chi, E., & Chae, S. H. (2000). Theory and practice of the Rasch
model. Seoul: Kyoyook Book.
Additionally, there is a multitude of AIGCP options [17] Rasch, G. (1980). Probabilistic models for some intelligence and
available, and this study specifically employed ChatGPT attainment tests. 1960. Copenhagen, Denmark: Danish Institute fo
for investigation. It remains to be explored whether other r Educational Research.
[18] Wei Yuping & Pan Honghui (2016). Research on the Application
AIGCP possess similar capabilities. In the future, it is of Bidirectional Inventory in Academic Evaluation. Contemporar
imperative to enhance the accuracy of teachers' ability to y Education Forum (04), 71-76.
design appropriate exercises for students of varying levels [19] Zhang Xianglin (2018). The Application of Bidirectional Catalog
ue in Basic Chinese Teaching. Journal of Jiamusi Vocational Coll
using AIGCP. Moreover, there is a need for in-depth ege (01), 221-222.
exploration of the potential applications of different [20] Chou, Y. T., & Wang, W. C. (2010). Checking dimensionality in i
AIGCP, aiming to establish a comprehensive repository of tem response models with principal component analysis on standa
exercise resources across various disciplines and rdized residuals. Educational and Psychological Measurement, 70
(5), 717-731.
educational stages.Teachers adeptly combining AIGCP [21] Miguel, J. P., Silva, J. T., & Prieto, G. (2013). Career decision sel
and the Rasch model, employing the bidirectional fine- f-efficacy scale—short form: a Rasch analysis of the Portuguese v
grained table approach, can precisely uncover students' ersion. Journal of Vocational Behavior, 82(2), 116-123. https:// do
i.org/10.1016/j.jvb.2012.12.001.
cognitive abilities in a personalized manner, enabling [22] Linacre, J. M., & Wright, B. D. (2000). Winsteps. URL: http://ww
tailored instructional guidance for individual students. w. winsteps. com/index. htm [accessed 2013-06-27][WebCite Cac
he].
[23] Fahyan, S. E. . (1994). Taxonomy of educational objectives : the c
REFERENCES lassification of educational goals / Benjamin S. Bloom. [et al.].
[1] Ajevski, M., Barker, K., Gilbert, A., Hardie, L., & Ryan, F. (202 [24] Cooper, G. (2023). Examining science education in ChatGPT: An
3). ChatGPT and the future of legal education and practice. The L exploratory study of generative artificial intelligence. Journal of
aw Teacher, 1-13. Science Education and Technology, 1-9.
[2] Jiao Jianli (2023). ChatGPT Boosts the Digital Transformation of
School Education : What to Learn and How to Teach in the Age o
f Artificial Intelligence. Distance Education in China (04), 16-23.
[3] Shoufan, A. (2023). Exploring Students’ Perceptions of CHATGP Han Xue was Born in Chongqing, China in 1999.
T: Thematic Analysis and Follow-Up Survey. IEEE Access. He is currently studying for a master's degree in
[4] Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for education modern educational technology at Chongqing Normal
and research: Opportunities, threats, and strategies. Applied Scien University.
ces, 13(9), 5783. His main research directions include educational data
[5] You Jiajing, Tian Xing & Ding Nai (2023). Exploration of Junior mining and educational evaluation.
High School Mathematics Learning Based on ChatGPT. Middle S
chool Mathematics Monthly (05), 63-67.
[6] Talib, A. M., Alomary, F. O., & Alwadi, H. F. (2018). Assessmen
t of student performance for course examination using Rasch mea
surement model: A case study of information technology fundame
ntals course. Education Research International, 2018, 1-8.
[7] Wu Fati, Tian Hao, Wang Yu & Fan Minsheng (2021). Research
on Knowledge Mastery and Cognitive Ability Analysis Based on Yanmin Niu graduated from Chongqing University
Rasch Model in the Perspective of Smart Education. Journal of Ea with a doctor's degree in engineering. Associate
st China Normal University (Education Science Edition) (08), 57- Professor, Master's Supervisor. Her main research
69. direction is education data mining and smart
[8] Liu Yixuan & Yao Jianxin (2023). Research on the maintenance t education
echnology of question banks based on the Rasch model China Ex
am (04), 68-77.
[9] Yang Zhiming & Xia Shengjun (2021) Design and algorithm impr
ovement of computerized adaptive multi-stage testing in the conte
xt of "double subtraction" Education Measurement and Evaluatio
n (11), 3-9.
[10] Liu, K., Zhang, L., Tu, D., & Cai, Y. (2022). Developing an Item
Bank of Computerized Adaptive Testing for Eating Disorders in
Chinese University Students. SAGE Open, 12(4).
[11] Bond, T., & Fox, C. (2007). Applying The Rasch Model: fundam
ental measurement in the human sciences.
[12] Shen Dian & Xu Jiamin (2020). Research review on the quality of
evaluation tools based on Rasch model analysis. China Exam (0
2), 65-71.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like