You are on page 1of 8

MANCHESTER METROPOLITAN UNIVERSITY

DEPARTMENT OF COMPUTING AND


MATHEMATICS

6G6Z0018 RESEARCH METHODS


FEASIBILITY STUDY

Student name: Hamad Ashraf


Student ID number: 18035950
Degree title: Software engineering
Project Theme: Software engineering
Project title: Automated essay grading system
6G6Z0018 Research NPC Assignment
Methods

1 Aims and Objectives

1.1 Aims

The project involves designing, developing, and testing an automated essay grading system. This
includes researching existing systems, defining grading parameters, creating algorithms, rigorous
testing, user interface design, and iterative refinement for accuracy and usability.

1.2 Objectives

Research Existing Systems:


Investigate current automated grading systems and their methodologies.
Define Parameters:
Establish criteria and metrics for grading essays (grammar, coherence, content).
Develop Algorithms:
Create algorithms to assess and score essays based on predefined parameters.
Testing and Validation:
Evaluate the system's accuracy and reliability through extensive testing.
User Interface Design:
Design an intuitive interface for users (teachers, students) to interact with the system.

1.3 Learning Outcomes

The learning outcomes are as follows:

o How problems that can be addressed with computer software are identified and analysed, how
potential stakeholders for systems are identified, and how software engineers elicit
requirements from them.
o The software development lifecycle, and how each of the processes therein can be arranged
and managed to develop software using standard software engineering methodologies.
o The foundational underpinnings of computing, computer programming and computer systems.
o Common system architectures and designs that are used to overcome standard problems
associated with scale, geographic topology, interoperability and maintainability.
o The working practice of the software industry, and the typical tools and techniques employed in
the software development process.
o The methods and tools used to verify that software functions correctly, both as part of the
development process and during the softwares operation.

2 Literature Survey

2.1 Introduction
The educational technology sphere has witnessed a boom in the popularity of Automated Essay
Grading Systems (AEGS) as a potential solution, owing to the constraints of manual essay
assessment that is subjective and time-consuming. Thanks to developments in natural language
processing and machine learning, advanced AEGS can now analyze and assess written work with
remarkable precision. This study looks at how important AEGS is in mitigating the subjectivity that
comes with manual grading. The literature study demonstrates how AEGS can save resources and
function consistently, which makes them an essential part of contemporary education evaluation
procedures. Consequently, the purpose of this thorough examination of the scholarly and

MMU 2 DCM
6G6Z0018 Research NPC Assignment
Methods
technological environment pertaining to AEGS is to highlight the importance of the suggested
study in this developing topic.

2.2 Background information and expected activities.

Automated Essay Scoring (AES) systems play a pivotal role in addressing the challenging task of
grading writing compositions efficiently. In the realm of language exams, where evaluating essays for
language proficiency is essential, AES systems leverage advanced technologies like Natural Language
Processing (NLP) and machine learning to streamline the scoring process. This paper meticulously
reviews existing literature on AES, differentiating between two main categories: handcrafted features
AES systems and automatically featured AES systems (Hussein et al., 2019). The former relies on the
quality of predefined features, while the latter involves the automatic learning of features and their
correlations with essay scores. The review evaluates these systems based on their primary focus,
techniques employed, reliance on training data, instructional applications, and the correlation between
AES scores and human scores.

Automated essay grading systems have undergone significant advancements, evolving to


demonstrate notable accuracy and reliability over time. Despite initial skepticism regarding the aptness of
machine-generated scores compared to manual grading, these systems have proven consistent and
efficient. Human evaluators' subjectivity in grading essays has long been identified as a challenge, leading
to variations in assigned grades and potential unfairness. In contrast, automated essay evaluation systems
offer uniformity in scoring, ensuring fairness, and additionally, they contribute to significant time and
cost savings (Vijaya et al., 2022).

As essays gain prominence as essential tools for assessing academic accomplishments, fostering
new ideas, and testing students' recall abilities, the integration of automated grading systems becomes
increasingly relevant. An in-depth exploration of the characteristics of these systems, including their
reliance on machine learning, artificial intelligence, and natural language processing, allows researchers
to analyze the features determining essay grades, such as style, semantics, and content.

According to Ifenthaler (2022) in the contemporary education system, the role of assessment in
gauging student performance is crucial. The prevailing evaluation method relies heavily on human
assessment, but with an increasing teacher-student ratio, manual evaluation becomes intricate. Manual
assessment is time-consuming, lacks reliability, and poses numerous challenges. Consequently, online
examination systems have emerged as a viable alternative to traditional pen-and-paper methods. While
current computer-based evaluation systems are adept at handling multiple-choice questions, they fall short
in providing a comprehensive evaluation for essays and short answers. Over the past few decades,
researchers have delved into automated essay grading and short answer scoring, addressing challenges in
content relevance, idea development, cohesion, coherence, and various stylistic aspects.

Historically, the necessity for efficient and objective essay grading has driven the evolution of
automated systems. Early attempts, such as Project Essay Grader (PEG) in 1966, focused on assessing
writing characteristics like grammar and construction. The Intelligent Essay Assessor (IEA) introduced by
Foltz et al. (1999) used latent semantic analysis for content evaluation. Later systems like E-rater and
Intellimetric incorporated natural language processing techniques for style and content assessment.
Traditional approaches in the 1990s relied on pattern matching and statistical methods, while recent
developments have seen a shift towards regression-based and natural language processing techniques.
Notably, advancements in deep learning, as seen in Dong et al. (2017), have enhanced the capability of
automated essay scoring systems by incorporating syntactic and semantic features.

MMU 3 DCM
6G6Z0018 Research NPC Assignment
Methods

Works by Hussein et al., (2019) reveal that AES models exhibit versatility by utilizing a wide
range of manually tuned linguistic features, encompassing both shallow and deep linguistic aspects. These
systems exhibit strengths in mitigating labor-intensive grading activities, ensuring consistent application
of scoring criteria, and maintaining objectivity in the assessment process. However, despite continuous
advancements in AES techniques, challenges persist. Notably, the systems struggle with capturing the
human sense of a rater, the potential for deceptive scoring, and the limited ability to assess the creativity
and practicality of ideas. While efforts have been made to address the first two challenges, the paper
emphasizes the need for future research to focus on enhancing AES systems' capabilities in assessing the
more nuanced aspects of writing compositions.

Susanti et al., (2023), tries to provide a comprehensive analysis of various paper, highlighting the
dataset used, the methodology applied, and an explanatory account of the automated essay correction
process. Their findings suggest that the implementation of automated essay correction systems in
education has progressed successfully, employing various methods to expedite and enhance the efficiency
of essay grading. Among the ten reviewed papers, a multitude of algorithms and techniques were
employed to measure the similarity of students' responses to answer keys, resulting in automated scores.
The datasets utilized originated from student responses, processed using distinct algorithms detailed in the
respective studies. The results collectively demonstrate the evolving landscape of auto-correcting essay
exams, presenting opportunities for future research and development in machine learning models to
diversify and improve automated essay scoring systems.

Ramalingam et al., (2018), states that in the realm of education, essays serve as a vital tool for
assessing academic excellence and connecting diverse ideas, demonstrating the ability to recall
information. However, the manual assessment of essays is known for its time-consuming nature and
subjectivity. The increasing teacher-student ratio further complicates the manual grading process, making
it an expensive and resource-intensive endeavor. Recognizing these challenges, this project aims to
develop an automated essay assessment system using machine learning techniques.

The objective is to classify a corpus of textual entities into discrete categories corresponding to
possible grades. By leveraging linear regression and various classifications and clustering techniques, the
proposed system seeks to reduce assessment time and enhance the realism of scores by automating the
grading process. The evolution of computer impact on writing spans four decades, with even basic
computing functions significantly aiding authors in refining their written material. Computers, as
effective cognitive tools, have the potential to enhance the writing process (Wen et al., 2022). The
exploration of various methodologies within the context of machine learning and artificial intelligence
offers researchers valuable insights into the multifaceted aspects contributing to the accurate and reliable
assessment of essays, marking a significant shift in the landscape of educational evaluation.

Previous research in AEGS has focused on various approaches, including rule-based systems,
statistical models, and more recently, deep learning techniques. Early systems employed predefined rules
and heuristics to assess essays, but their effectiveness was limited by the complexity of language and the
inability to capture nuanced writing styles. Subsequent studies introduced statistical models, such as
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA), which demonstrated improved
performance by considering semantic relationships and topic modeling.

Recent advancements in deep learning, particularly recurrent neural networks (RNNs) and
transformer-based models, have shown remarkable success in capturing contextual information and
understanding the intricate nuances of human language. However, challenges persist in achieving a
holistic understanding of essays, including context, coherence, and creativity.

MMU 4 DCM
6G6Z0018 Research NPC Assignment
Methods

2.3 Technical Background


The technical landscape of AEGS involves the integration of natural language processing (NLP)
techniques and machine learning algorithms. NLP algorithms are essential for understanding the
syntactic and semantic structure of essays, enabling the system to extract meaningful features for
evaluation. Machine learning models, ranging from traditional classifiers to sophisticated deep
learning architectures, play a pivotal role in predicting essay grades based on these extracted
features. Key technical challenges include the development of models capable of handling various
writing styles, addressing ambiguous or creative language, and ensuring fair and unbiased grading.
The exploration of explainable AI (XAI) techniques is also crucial, as understanding the reasoning
behind automated grading decisions is paramount for user acceptance and trust.

2.4 Importance of proposed work


The proposed work aims to contribute to the ongoing evolution of AEGS by addressing the
limitations of existing models and exploring novel approaches. The significance of this research lies
in its potential to enhance the accuracy and reliability of automated essay grading, thereby
providing educators with a valuable tool for timely and consistent assessment. Additionally, the
project intends to delve into the ethical considerations associated with AEGS, ensuring that the
deployment of such systems aligns with principles of fairness and transparency. As educational
institutions increasingly adopt online learning platforms, the demand for efficient and scalable
essay grading solutions continues to grow. By advancing the state-of-the-art in AEGS, this research
seeks to make a meaningful impact on the educational assessment landscape, benefiting both
educators and students.

2.5 Evaluation plan

The evaluation plan for the Creative Piece involves a comprehensive assessment of the developed
AEGS in comparison to existing models. The evaluation will encompass both experimental procedures
and statistical tests to measure the system's effectiveness in grading essays accurately and consistently.

Experimental Procedures

The evaluation will involve a diverse dataset comprising essays from different academic levels
and disciplines. Essays will be sourced from established databases and educational institutions, ensuring a
representative sample that captures the variability in writing styles and content. The dataset will be
preprocessed to remove identifying information and standardized for fairness. The AEGS will be trained
on a subset of the dataset and validated on another subset to fine-tune model parameters. The evaluation
will include a rigorous testing phase on a separate test set to assess the system's generalization capability
(Hussein et al., 2019). Performance metrics such as precision, recall, and F1 score will be calculated to
quantify the model's accuracy, and additional qualitative analysis will be conducted to assess its ability to
handle creative and nuanced writing.

In addition to the aforementioned experimental procedures, the evaluation plan for the Automated
Essay Grading System (AEGS) will implement a thorough analysis of the system's performance across
different essay prompts and topics. To ensure a comprehensive assessment, the dataset will encompass
essays from various genres, including argumentative, descriptive, narrative, and expository writing. This
diversity in prompts aims to evaluate the AEGS's adaptability and effectiveness in grading essays with
distinct rhetorical and structural characteristics. Furthermore, the evaluation will incorporate a

MMU 5 DCM
6G6Z0018 Research NPC Assignment
Methods
comparative analysis with existing manual grading methods (Hussein et al., 2019). A subset of the essays
will be randomly selected and graded by human evaluators to establish a benchmark for comparison. This
comparative assessment will not only provide insights into the AEGS's accuracy but also shed light on
potential areas for improvement or refinement.

To enhance the effectiveness of our evaluation, a sensitivity analysis will be conducted, exploring
the AEGS's performance under various conditions, such as different essay lengths and levels of
complexity. This analysis aims to identify any potential biases or limitations in the system's grading
capabilities and ensure its reliability across a spectrum of writing scenarios. Moreover, the evaluation
plan will address potential ethical considerations related to bias in grading. An analysis of the AEGS's
performance across diverse demographics and backgrounds will be conducted to identify and mitigate any
disparities in its evaluation outcomes.

Statistical Tests and Comparison Algorithms

Statistical significance tests, such as t-tests or Wilcoxon signed-rank tests, will be employed to
compare the performance of the developed AEGS against baseline models and state-of-the-art grading
systems. The comparison will extend beyond traditional accuracy metrics, considering factors like
fairness, interpretability, and computational efficiency. Additionally, the evaluation will explore the
impact of fine-tuning hyper parameters and the use of different pre-trained language models on the
AEGS's performance. Comparative analysis with existing AEGS will provide insights into the
advancements achieved by the proposed work.

2.6 Ethical issues, physical risks and mitigations of both

Research Ethics

As with any technological development, AEGS raises ethical concerns related to bias, fairness,
and privacy. It is imperative to address these concerns throughout the research process. The use of diverse
datasets, carefully curated to avoid bias, will be a primary mitigation strategy (Weidinger et al., 2021).
Moreover, the AEGS will undergo rigorous fairness assessments to identify and rectify any potential
biases in grading based on factors such as gender, ethnicity, or socio-economic background. Transparency
and accountability in the decision-making process will be emphasized, with efforts to provide clear
explanations for grading decisions through the incorporation of explainable AI techniques. Ethical
considerations will guide the development of guidelines for the responsible deployment of AEGS in
educational settings, promoting transparency and user awareness.

In addition to the outlined strategies for mitigating bias and ensuring fairness, the ethical
considerations surrounding Automated Essay Grading Systems (AEGS) will extend to encompass a
robust privacy framework. Protection of student data and maintaining confidentiality will be of
paramount importance (Weidinger et al., 2021). Stringent measures will be implemented to secure the
essay datasets, ensuring that no personally identifiable information is disclosed or accessible. The
research process will include a thorough privacy impact assessment to identify potential risks and
vulnerabilities related to data handling and storage. Data encryption protocols, secure storage practices,
and controlled access measures will be implemented to safeguard against unauthorized disclosure or

MMU 6 DCM
6G6Z0018 Research NPC Assignment
Methods
breaches. Additionally, explicit consent mechanisms will be established, clarifying how student data will
be used in the development and evaluation of the AEGS, fostering transparency and trust among users.

Moreover, the ethical considerations will extend to the responsible deployment of the AEGS
within educational institutions. Guidelines and protocols will be developed to guide educators and
administrators on the ethical utilization of the system, ensuring that its implementation aligns with
principles of fairness, accountability, and respect for student privacy (Weidinger et al., 2021).

Physical Risks and Participant Safety

Given that this research primarily involves data analysis and model development, there are minimal
physical risks associated with the project. However, it is crucial to consider the potential psychological
impact on participants, especially if the essays involve personal or sensitive content. Anonymzing and de-
identifying the dataset will be a key measure to protect the privacy and confidentiality of participants
(Weidinger et al., 2021). Additionally, informed consent will be obtained from any individuals
contributing essays to the dataset, clearly outlining the purpose of the research and how their data will be
used. Protocols for secure data storage and disposal will be implemented to safeguard participants'
information.

3 References
Dong, F., Zhang, Y. and Yang, J., 2017, August. Attention-based recurrent convolutional neural network

for automatic essay scoring. In Proceedings of the 21st conference on computational natural

language learning (CoNLL 2017) (pp. 153-162). https://aclanthology.org/K17-1017/

Hussein, M.A., Hassan, H. and Nassef, M., 2019. Automated language essay scoring systems: A literature

review. PeerJ Computer Science, 5, p. e208.

Ifenthaler, D., 2022. Automated essay scoring systems. In Handbook of open, distance and digital

education (pp. 1-15). Singapore: Springer Nature Singapore.

Ramalingam, V.V., Pandian, A., Chetry, P. and Nigam, H., 2018, April. Automated essay grading using

machine learning algorithm. In Journal of Physics: Conference Series (Vol. 1000, p. 012030). IOP

Publishing. https://iopscience.iop.org/article/10.1088/1742-6596/1000/1/012030/meta

Ramesh, D. and Sanampudi, S.K., 2022. An automated essay scoring systems: a systematic

literature review. Artificial Intelligence Review, 55(3), pp.2495-2527.

MMU 7 DCM
6G6Z0018 Research NPC Assignment
Methods
Susanti, M.N.I., Ramadhan, A. and Warnars, H.L.H.S., 2023. Automatic essay exam scoring system:

A systematic literature review. Procedia Computer Science, 216, pp.531-538.

Shetty, S., Guruvyas, K.R., Patil, P.P. and Acharya, J.J., 2022, March. Essay scoring systems using AI

and feature extraction: A review. In Proceedings of Third International Conference on

Communication, Computing and Electronics Systems: ICCCES 2021 (pp. 45-57). Singapore: Springer

Singapore.

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.S., Cheng, M., Glaese, M., Balle,

B., Kasirzadeh, A. and Kenton, Z., 2021. Ethical and social risks of harm from language

models. arXiv preprint arXiv:2112.04359.

Wen, X. and Walters, S.M., 2022. The impact of technology on students’ writing performances in

elementary classrooms: A meta-analysis. Computers and Education Open, 3, p.100082.

MMU 8 DCM

You might also like