You are on page 1of 23

Abstract

The advent of AI-empowered chatbots capable of constructing human-like sentences and


articulating cohesive essays has captivated global interest. This paper provides a historical
perspective on chatbots, focusing on the technology underpinning the Chat Generative Pre-trained
Transformer, better known as ChatGPT. We underscore the potential utility of ChatGPT across a
multitude of fields, including healthcare, education, and research. To the best of our knowledge, this
is the first review that not only highlights the applications of ChatGPT in multiple domains, but also
analyzes its performance on examinations across various disciplines. Despite its promising
capabilities, ChatGPT raises numerous ethical and privacy concerns that are meticulously explored
in this paper. Acknowledging the current limitations of ChatGPT is crucial in understanding its
potential for growth. We also ask ChatGPT to provide its point of view and present its responses to
several questions we attempt to answer.

Keywords: artificial intelligence, natural language processing,

ChatGPT, text generation, chatbots


TABLE OF CONTENTS

Table of Contents 8

Chapter-1: Introduction 9-11


1.1 Introduction
1.2 What is ChatGPT
1.3 Chat technology : from language model to reinforcement learning

Chapter-2: Working and Architecture 12-15


2.1 How does chatgpt work
2.2 How is chatgpt trainied
2.3 ChatGPT Architecture

Chapter-3: Adoption And Analysis 16-21


3.1 Evolution of openAI's GPT
3.1.1 GPT-1
3.1.2 GPT-2
3.1.3 GPT-3
3.1.4 GPT-3.5
3.1.5 GPT-4
3.2 Chatgpt adoption
3.3 Analysis of chatgpt performance on exams

Chapter-4: Advantages And Applications 22-23


4.1 Advantages
4.2 Applications of ChatGPT
4.3 Limitations of ChatGPT
Chatper-5: Conlusion 25
5.1 Conclusion
Reference
CHAPTER-1
INTRODUCTION
1.1 Introduction

In 1950, the British computer scientist Alan Turing disputed whether human reasoning can be
matched by computers: “Can machines think?” (TURING, 1950). Subsequently, he proposed the
Turing Test to measure computer or artificial intelligence. In a Turing test, a human interrogator is
presented with responses from a human and a computer (with the ability to generate written texts in
real-time). If the interrogator cannot distinguish between the answers, the computer system passes
the Turing Test. Although several computer programs and chatbots like Eliza demonstrated success
in the Turing test ((Weizenbaum, 1966) (Güzeldere & Franchi, 1995)), these programs arguably
used certain tricks to pass the test (Pinar Saygin et al., 2000) rather than demonstrating any
significant intelligence. With the advancement in machine learning and natural language processing
(NLP), chatbots have gained significant research attention and have been used for a variety of
commercial and non-commercial applications ((Luo et al., 2022), (Adamopoulou & Moussiades,
2020), (Ranoliya et al., 2017), (Rahman et al., 2017), (Zhou et al., 2020)). Despite their vast
adoption, most chatbots do not have personalization, and user satisfaction remains questionable
(Følstad & Brandtzaeg, 2020). This limitation prompted researchers and developers to focus on
chatbot engagement in making chatbots more conversational.

In November 2022, OpenAI released a novel AI-powered chatbot called ChatGPT 1, which rapidly
gained substantial public attention. Distinguishing itself from prior chatbots, ChatGPT has
demonstrated notable proficiency in generating logically consistent responses to a broad array of
inquiries. In this review, we investigate the technological evolution that facilitate ChatGPT's
remarkable capabilities and illuminate various domains where this innovative tool finds its
applications. Moreover, we leverage the conversational capabilities of ChatGPT to elicit its
responses to a series of queries we sought to address.

1.2 What is ChatGPT

ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a large language
model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users
to refine and steer a conversation towards a desired length, format, style, level of detail, and
language. Successive prompts and replies, known as prompt engineering, are considered at each
conversation stage as a context.

ChatGPT is built upon either GPT-3.5 or GPT-4—members of OpenAI's proprietary series


of generative pre-trained transformer (GPT) models, based on the transformer architecture developed
by Google[3]—and is fine-tuned for conversational applications using a combination
of supervised and reinforcement learning techniques. ChatGPT was released as a freely available
research preview, but due to its popularity, OpenAI now operates the service on a freemium model.
It allows users on its free tier to access the GPT-3.5-based version. In contrast, the more advanced
GPT-4-based version and priority access to newer features are provided to paid subscribers under the
commercial name "ChatGPT Plus".

By January 2023, it had become what was then the fastest-growing consumer software application in
history, gaining over 100 million users and contributing to OpenAI's valuation growing to $29
billion.[5][6] ChatGPT's release spurred the development of competing products, including Bard, Ernie
Bot, LLaMA, Claude, and Grok.[7] Microsoft launched its Bing Chat based on OpenAI's GPT-4. It
raised concern among some observers over the potential of ChatGPT and similar programs to
displace or atrophy human intelligence, enable plagiarism, or fuel misinformation.

1.3 ChatGPT Technology: From Language Models to Reinforcement Learning

Machine learning, a subfield of artificial intelligence (AI), is defined as the ability of computer
systems to learn from experience without being explicitly taught or programmed. With advances in
computing power, increasing data availability, and algorithmic improvements, deep learning has
emerged as a high-performing predictive tool. Unlike traditional machine learning, deep learning
algorithms use artificial neural networks, often in a large capacity. Deep learning models generally
require more training data but can provide better generalization. Transformers or large language
models are a special family of deep learning models that have emerged as state-of- the-art solutions
for text data. From text classification to text generation, these models offer superior performance
than other AI models. Language models are usually pre-trained on large language corpus, such as
Wikipedia, but can be fine-tuned for a specific task. The two most prominent language models
include Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019) and
Generative Pretrained Transformer (GPT) (Radford et al., 2019). The former is primarily used for
recognizing the context in language and classification tasks, while the latter is more adept at
language generation tasks such as machine translation and question-answering.
Several chatbots prior to ChatGPT used language models. However, the main component that
allowed ChatGPT its coherent and engaging responses is attributed to the use of reinforcement
learning. In this context, human evaluators ranked the responses of ChatGPT, which allowed it to
optimize some of its parameter to essentially become a better conversationalist. ChatGPT was
developed on top of GPT-3, a generative autoregressive language model with 175 billion
parameters (Brown et al., 2020). To train such a complex network, OpenAI partnered with
Microsoft with a reported training resource of over 350GB of memory costing
CHAPTER-2
WORKING AND ARCHITECTURE

2.1 How Does ChatGPT Work?

ChatGPT works by attempting to understand a text input (called a prompt) and generating dynamic
text to respond.It can do this because it’s a large language model (LLM). It’s essentially a super-
sized computer program that can understand and produce natural language.

Here’s how ChatGPT describes it:

ChatGPT’s creators used a deep learning training process to make this possible. In other words, they
gave this computer the tools to process data like a human brain does.

Eventually, the system could recognize patterns in words and follow examples. Then create its own
in response.

According to a research paper by OpenAI, the training data for ChatGPT’s LLM included 45TB of
compressed plaintext. For reference, one TB works out at roughly 6.5 million document pages.

ChatGPT has been a household name for less than a year, but the algorithms working in the
background of the popular AI tool have actually been powering a whole range of apps and services
since 2020. So to understand how ChatGPT works, we need to start by talking about the underlying
language engine that powers it.

The GPT in ChatGPT is mostly two related algorithms: GPT-3.5 Turbo and GPT-4, though the latter
is only available in ChatGPT for ChatGPT Plus subscribers. The GPT bit stands for Generative Pre-
trained Transformer, and the number is just the version of the algorithm. The GPT models were
developed by OpenAI (the company behind ChatGPT and the image generator DALL·E 2), but they
power everything from Bing's AI features to writing tools like Jasper and Copy.ai. In fact, most of
the AI text generators available at the moment use GPT-3, GPT-3.5, and GPT-4—though they tend
to keep quiet when they use each one.

Create a custom AI chatbot—no code needed


ChatGPT brought GPT into the limelight because it made the process of interacting with an AI text
generator simple and—most importantly—free to everyone. Plus, it's a chatbot, and people have
loved a good chatbot since SmarterChild.

While GPT-3.5 and GPT-4 are the most popular large language models (LLMs) right now, over the
next few years, there's likely to be a lot more competition. Google, for example, has Bard—its AI
chatbot—which is powered by its own language engine, Pathways Language Model (PaLM 2). Meta,
Facebook's parent company, released Llama 2, its latest LLM. And then there are other options
aimed at large companies, like Writer's Palmyra LLMs and Anthropic's Claude. Still, at least for
now, OpenAI's offering is the de facto industry standard. It's just the easiest tool for people to get
their hands on.

So the answer to "how does ChatGPT work?" is basically: GPT-3.5 and GPT-4. But let's dig a little
deeper.
2.2 How is ChatGPT trained?

Copious amounts of conversational exchanges, fine-tuned to a specific task or domain, are used as a
dataset to train ChatGPT, using a variant of the transformer architecture for predicting
words.Instead of feeding in answers to the model it is allowed to pick up patterns and relationships
in the data on

ChatGPT, like InstructGPT (its sibling model), is trained using Reinforcement Learning from
HumanFeedback (RLHF). There is only a slight difference in the data collection setup for ChatGPT. There
was supervised fine- tuning involved in training the model, where human AI trainers played both, the
user and the AI assistant, in the conversation. The trainers composed their responses on model-written
suggestions. The new dialogue dataset was then mixed with the dataset from InstructGPT and
transformed into a dialogue format.

Creating a reward model for reinforcement learning requires comparison data, consisting of model
responses ranked as per their quality. A selection of conversations between the AI trainersand the
chatbots were used to collect this data. A randomly selected message, produced by themodel was
selected. Several completions of the message were sampled, and AI trainers were asked to rank them.
Based on these reward models, ChatGPT was fine-tuned using the Proximal Policy Optimization.
Several iterations of this process were performed.

ChatGPT is based on particular GPT foundation models, namely GPT-3.5 and GPT-4, that were fine-
tuned to target conversational usage.[9] The fine-tuning process leveraged both supervised learning as
well as reinforcement learning in a process called reinforcement learning from human
feedback (RLHF).[10][11] Both approaches employed human trainers to improve model performance. In
the case of supervised learning, the trainers played both sides: the user and the AI assistant. In the
reinforcement learning stage, human trainers first ranked responses that the model had created in a
previous conversation.[12] These rankings were used to create "reward models" that were used to fine-
tune the model further by using several iterations of Proximal Policy Optimization (PPO).[10][13]

Time magazine revealed that to build a safety system against harmful content (e.g. sexual
abuse, violence, racism, sexism), OpenAI used outsourced Kenyan workers earning less than $2 per
hour to label harmful content. These labels were used to train a model to detect such content in8 the
future. The outsourced laborers were exposed to "toxic" and traumatic content; one worker described
the assignment as "torture". OpenAI's outsourcing partner was Sama, a training-data company based
in San Francisco, California.[14]

ChatGPT initially used a Microsoft Azure supercomputing infrastructure, powered by Nvidia GPUs,
that Microsoft built specifically for OpenAI and that reportedly cost "hundreds of millions of
dollars". Following ChatGPT's success, Microsoft dramatically upgraded the OpenAI infrastructure
in 2023.[15] Scientists of University of California, Riverside, estimate that a series of prompts to
ChatGPT needs approximately 500 milliliters of water for Microsoft servers cooling. [16] TrendForce
market intelligence estimated that 30,000 Nvidia GPUs (each costing approximately $30,000 [17])
were used to power ChatGPT in 2023.[18][19]

OpenAI collects data from ChatGPT users to train and fine-tune the service further. Users can upvote
or downvote responses they receive from ChatGPT and fill in a text field with additional feedback. [20]
[21]

ChatGPT's training data includes software manual pages, information about internet phenomena such
as bulletin board systems, and multiple programming languages.[22] Wikipedia was also one of the
sources of ChatGPT's training data.

2.3 Chatgpt architecture


The original Transformer model is built on the encoder-decoder architecture, which consists of two
9
stacks of Transformer blocks as the encoder and decoder respectively. The encoder adopts stacked
multi-head self-attention layers to encode the input sequence for generating its latent
representations, while the decoder performs cross- attention on these representations and
autoregressively generates the target sequence.

Since most language tasks can be cast as the prediction problem based on the input, the decoder-
only LLMs are potentially advantageous to implicitly learn how to accomplish the tasks in a unified
way. GPT-2 and GPT-3 use a casual decoder architecture (see the diagram below). In GPT-3, there
are 96 layers transformer decoders. When mentioning “decoder-only architecture,” it often refers
to the casual decoder architecture.

 Input Sequence (Inputs): The input sequence is a sequence of tokens (e.g., words or sub-words)
that represent the text input.

 Input Embedding: The first step in transformation is to convert the input sequence into a matrix of
vectors, where each vector represents a token in the sequence. This process is called input
embedding. The input embedding layer maps each token to a high-dimensional vector that captures
the semantic meaning of the token. 1
0
 Self-Attention Mechanism: The self-attention mechanism allows the model to compute
relationships between different parts of the input sequence. It consists of three steps: query, key,
and value computations, and attention computation. In the query, key, and value computations, the
input vectors are transformed into three different representations using linear transformations. In
the attention computation step, the model computes a weighted sum of the values, where the
weights are based on the similarity between the query and key representations. The weighted sum
represents the output of the self-attention mechanism for each position in the sequence.

 Multi-Head Self-Attention: The Transformer architecture uses multi-head self-attention, which


allows the model to focus on different parts of the input sequence and compute relationships
between them in parallel. In each head, the query, key, and value computations are performed with
different linear transformations, and the outputs are concatenated and transformed into a new
representation.

 Feedforward Network: The output of the multi-head self-attention mechanism is fed into a
feedforward network, which consists of a series of fully connected layers and activation functions.
The feedforward network transforms the representation into the final output.

 Layer Normalization (Add & Norm Layer): The activations in each layer of the Transformer
architecture are normalized using layer normalization, which helps stabilize the training process
and prevent the model from overfitting. A residual connection followed by layer normalization,
which helps to stabilize the training process and make the model easier to train.

 Positional Encoding: To capture the order of the tokens in the input sequence, a positional
encoding is added to the input embedding. The positional encoding is a vector that represents the
position of each token in the sequence.

 Stacking Layers: The Transformer architecture can be stacked to form a deep neural network by
repeating the multi-head self-attention mechanism and feedforward network multiple times.

 Output: The final output from the Transformer, which is a vector representation of the input
sequence.

1
1
CHAPTER- 3
ADOPTATION AND ANALYSIS

3.1 Evolution of openAI’s GPT

OpenAI’s contributions to AI development are numerous, but ChatGPT’s release date will remain a

milestone in the history of generative AI.

3.1.1 GPT-1

Introduced in June 2018, GPT-1 was OpenAI’s first transformer-based language model. With 117
million parameters, GPT-1 was among the most prominent language models at the time. The
model used books as training data and could perform various tasks, including textual alignment,
semantic similarity, reading comprehension, commonsense reasoning, and sentiment analysis.

3.1.2 GPT-2

Open AI introduced GPT-2 in February 2019. The model had 1.5 billion parameters and was
trained with information from the internet. It was able to perform a broader scope of tasks without
task-specific training.
OpenAI initially refrained from releasing the complete model due to concerns about misuse.
Instead, it gradually released smaller model versions for research purposes.

3.1.3 GPT-3

The company behind ChatGPT released GPT-3 in 2020. As of August 2023, it is the only GPT
model that can be fine-tuned. GPT-3 has 175 billion parameters and much more powerful
capabilities than the previous models. But concerns about its subjectivity to disinformation and
biases continued.
So, instead of releasing an open-source model, OpenAI provided public access to GPT-3 through
12
an API. This allowed third parties to use the underlying technology, while OpenAI retains some
control
InstructGPT

Launched in January 2022, InstructGPT is a fine-tuned version of GPT-3. OpenAI’s primary


goal with this model was to reduce offensive language and misinformation and provide answers
that humans consider helpful.

3.1.4 GPT-3.5

GPT-3.5—the model behind ChatGPT—is a fine-tuned version of GPT-3 that13can


understand and generate natural language and code.
ChatGPT

ChatGPT was released to the public in November 2022. The technical capabilities of
InstructGPT and ChatGPT are almost identical. Both models were trained using the Reinforcement
learning from Human Feedback (RLHF) method. (Learn more about how ChatGPT was trained in
our article, ChatGPT: How to Understand and Compete with the AI Bot.)

3.1.5 GPT-4

OpenAI released its GPT-4 model to ChatGPT Plus paid subscribers in March 2023. The model
significantly improved ChatGPT’s capabilities, especially for complex tasks. It also reflects
OpenAI’s efforts to diminish the frequency of undesirable or harmful responses.

3.2 ChatGPT Adoption

Roughly 24% of Americans who are aware of ChatGPT have used it. The adoption rate varies with
age, education, and gender.

Younger and college-educated individuals are more likely to have used ChatGPT than older adults and
those with only a high school education.

14
3.3 Analysis of ChatGPT Performance on Exams

In the previous section, we discussed some of ChatGPT’s performance on taking on medical exams.
ChatGPT’s effective exam performance can facilitate personalized learning, exam preparation, and
tutoring services. In addition, it can also provide instantaneous feedback to students and create a
supportive learning environment for students, resulting in an effective and convenient tool for
students to prepare for exams and improve their scores.
In this section we summarize the performance of ChatGPT on different exams and present an
overall aggregate. It must be noted that some work did not present quantitative results. Table 2
highlights the performance of ChatGPT on several exams. Overall, ChatGPT obtained an average
accuracy of 59.53% across all exams.

An initial observation from this data is the broad range in accuracy results, indicating that
ChatGPT's performance is highly dependent on the specific exam or domain in question. In terms of
high performance, it is noticeable that ChatGPT scored remarkably well in the United States
Medical Licensing Exam, with accuracies of 89.5% and 94.6% reported by (Fijačko et al., 2023)
and (Kung et al., 2023), respectively. This suggests that ChatGPT can be highly effective in
understanding and answering complex medical-related questions. On the other hand, ChatGPT
struggled with certain subjects. Notably, (Antaki et al., 2023) reported relatively low accuracies of
55.8% and 42.7% in two Ophthalmology exams. Similarly, (Wang et al., 2023) reported low
performance for the Chinese National Medical Licensing Examination, raising questions about the
model's adaptability and consistencyin certain subject areas.

15
Chapter 4:

4.1 Advantages Of ChatGPT


It provides specific and quick responses to user queries, which can be beneficial for tasks
that require immediate assistance

.It remembers the interactions that it had previously with the user in the same prompt tree and
utilizes it to provide information surrounding those responses.

It allows users to have follow-up corrections until he/she gets satisfied with the response.

It has been trained to decline in-appropriate requests.
With 24/7 availability, Chat GPT can provide responses and assistance around the clock,
making it accessible whenever you need it.

It has access to vast amount of information up to its knowledge cutoff data, which can be
helpful for answering a wide range of questions.

ChatGPT can communicate in multiple languages, making it useful for global audience.

16
ChatGPT is not influenced by human biases and can provide impartial information.

It maintains consistency in responses and doesn't get tired or distractd, ensuring a uniform
level of assistance.

It can handle large number of users simultaneously, making it suitable for businesses and
organizations with high traffic.

It doesn't store personal data or information about users, ensuring privacy.

ChatGPT can be a valuable educational resource, helping users learn about a wide range of
topics.

4.2 Applications of ChatGPT

Despite being released to the public domain very recently, ChatGPT has already gained
significant research attention. In this section, we describe recent works investigating the use of
ChatGPT for various research and applications. This paper employs a scoping review
approach to explore recent research focusing on the application of ChatGPT across various
domains. This type of review is particularly suitable in this context due to the rapid pace of AI
advancements. This review style allows us to capture a broad spectrum of research on the
topic, providing insights on emerging evidence and identifying gaps in the current knowledge
base. To conduct a comprehensive literature search, we utilized Google Scholar as our primary
search engine. Google Scholar was chosen because of its expansive index of scholarly17
literature across various disciplines and formats. Our search included a combination of terms
such as "ChatGPT" and "Large Language Models (LLMs)" to ensure the inclusion of all
potentially relevant works.

It is important to note that some of the papers included in this review were not peer-reviewed at
the time of writing. We chose to include these articles due to the fast-paced nature of the AI
field, where new advancements and findings may not yet have undergone the often lengthy
peer-review process. The inclusion of such articles offers a more current and comprehensive
overview of the field.
Thorp (Thorp, 2023) provided a qualitative assessment of ChatGPT responses to research
topics, such as education, literature, and scientific writing. ChatGPT provided an entertaining
writeup when asked to complete a scene from a classic play. In terms of education, ChatGPT
can provide factual answers but still has some way to go in writing essays. In another study
(Else, 2023), researchers were asked to distinguish between abstracts of scientific papers
written by ChatGPT and humans. The results are surprising as human evaluators only
identified 68% of the abstracts to be generated by ChatGPT. De Angelis et al. (De Angelis et
al., 2023) discussed the evaluation of language models in light of ChatGPT and highlighted
potential ethical and practical challenges in medicine and public health. The main challenges
include the potential of AI-driven misinformation or “infodemic” that is often difficult to
discern.

In the field of medicine and public health, ChatGPT has already been explored for various
applications. Khan et al. (Khan et al., 2023) discussed several potential applications of ChatGPT in
medical education, including personalized learning and generating case studies. The authors also
pointed out that ChatGPT can be used in clinical management for documentation and decision
support. Rao et al. (Rao et al., 2023) evaluated the effectiveness of ChatGPT in providing clinical
decision support in radiology. The authors provided ChatGPT with text prompts such as “For
variant ‘Breast cancer screening. Average-risk women: women with <15% lifetime risk of breast
cancer.’, determine the single most appropriate imaging procedure” to evaluate its efficacy in breast
cancer screening and breast pain detection. ChatGPT performed relatively well for the former task
with 88.9% correct responses but only managed 58.3% correct responses for breast pain.

The role of ChatGPT and generative AI in helping urologists has also been discussed (Gabrielson et
18
al., 2023). ChatGPT can primarily help urologists in low-complexity tasks, giving them more time to
focus on patients. Hulman et al. (Hulman et al., 2023) utilized ChatGPT to answer frequently asked
questions about diabetes and asked healthcare employees to distinguish between human and
ChatGPT-generated answers. The authors found that the evaluators could identify answers
generated by ChatGPT correctly 59.5% of the time. The authors also concluded that despite
ChatGPT not being trained exclusively in medical data, it has clinical knowledge and can identify
information about disease management. Generating a medical report about a given topic may be
useful in pharmaceutical education. To this end, Zhu et al. (Zhu et al., 2023) prompted ChatGPT to
generate a mini-review on “lipid-based drug delivery systems.”

The authors concluded that ChatGPT can structure the topic well with meaningful conclusions for
the rHowever, there are question marks over the accuracy due to a lack of reliable citations. Shen et
al. (Shen et al., 2023) summarized other potential use cases and implications for ChatGPT in
medicine.
Researchers also investigated whether ChatGPT can answer medical exam questions. Kung et al.
(Kung et al., 2023) tested the performance of ChatGPT on the US medical licensingexam,
consisting of three standardized tests required for medical licensure in the US. ChatGPT performed
at the passing threshold level with 60% accuracy without specialized input from humans. Any
questions containing visual information, such as medial images, were removed.
The results demonstrate the potential of ChatGPT for medical education and assistance in clinical-
decision making. However, in a Chinese national medical licensing exam, ChatGPT performed
considerably lower, with 45.8% correct answers (Wang et al., 2023). In Ophthalmology, ChatGPT
was tested with questions from the Ophthalmic knowledge assessment program containing two
exams and obtained 55.8% and 42.7% accuracy, respectively (Antaki et al., 2023). For a basic and
advanced cardiovascular life support exam from the American Heart Association, ChatGPT
performed below the 84% passing threshold (Fijačko et al., 2023). However, its ability to provide
detailed rationality with reasonable accuracy makes it a potentially useful tool for self-learning and
exam preparation. Mbakwe et al. (Mbakwe et al., 2023) argued that the success of ChatGPT in
answering medical examinations boils down to the nature of these exams being rote memorization
rather than testing analysis and critical thinking.

One of the significantly anticipated applications of chatbots is in the domain of education. AI and
technology can be effective in education in several aspects, including personalized learning
(Shahriar et al., 2022). In this context, ChatGPT can enhance student participation, provide
experiential learning, and help educators in the evaluation of exams and content preparation
(Kasneci et al., 2023). Several researchers focused their studies on the impact of ChatGPT
19 in
education ((Rudolph et al., 2023), (Lund & Wang, 2023), (Mhlanga, 2023), (Kasneci et al., 2023)).
Potential concerns of ChatGPT in education include response bias ((Lund & Wang, 2023)
(Mhlanga, 2023)), cheating (Rudolph et al., 2023), leakage of private data ((Lund & Wang, 2023)
(Mhlanga, 2023)), and transparency (Mhlanga, 2023)). Chatbots can also contribute effectively to
peer tutoring. Pardos and Bhandari (Pardos & Bhandari, 2023) found that 70% of the hints offered
by ChatGPT in elementary and intermediate Algebra topics could result in positive learning gains
for students.

Frieder et al. (Frieder et al., 2023) evaluated the mathematical capabilities of ChatGPT in helping
Mathematicians with tasks like question answering and finding theorems. The researchers found
that ChatGPT displayed a level of Mathematics proficiency below those of an average graduate
student. There is public concern about ChatGPT being used for
plagiarism, and it is necessary to create tools to detect such plagiarism.

To this end, Khalil and Er (Khalil & Er, 2023) attempted to utilize ChatGPT as a plagiarism
detection tool and found that responses created by ChatGPT can often go undetected by other
plagiarism checkers. Meanwhile, when ChatGPT was asked to check if the writing was generated
by itself, it performed better than other plagiarism detection software. Yang et al. (Yang et al.,
2023) explored ChatGPT’s ability in summarizing written texts and found that ChatGPT performs
on par with existing fine-tuning methods based on Rouge scores. The authors also highlighted that
the current maximum input token length of 5000 is a limitation in assessing ChatGPT’s ability in
text summarizing. Can ChatGPT improve student essays? To answer this question, Basic et al.
(Basic et al., 2023) conducted a study with nine students in the control group and nine in the
experimental group that used ChatGPT. The authors concluded that ChatGPT does not necessarily
improve essay quality because the control group outperformed the experimental group in most
criteria.
Bang et al. (Bang et al., 2023) evaluated the effectiveness of ChatGPT on ten reasoning tasks, such
as logical reasoning and commonsense reasoning.

4.3 Limitations of ChatGPT

ChatGPT certainly has the potential for diverse and interesting applications. However, users should
20
consider the limitations of the current model. In this section, we outline some of the current
limitations of ChatGPT.
ChatGPT may sound interesting and convincing, but don’t take its word for it! Indeed, ChatGPT’s
ability in forming meaningful and conversational sentencesis quite impressive, but it mayoften
‘hallucinate’ responses (Alkaissi & McFarlane, 2023). Therefore, it is strongly recommended to
verify and fact-check any responses from ChatGPT.
ChatGPT makes errors in simple reasoning, logic, mathematics, and presenting factual information
(Borji, 2023). It is likely that the next version of GPT-4, expected to be released sometime in 2023,
will significantly improve ChatGPT. According to several sources, GPT-4 network will be far more

complex than its predecessor containing around 100 trillion parameters 4. The GPT-3 model, in
comparison, is made up of 175 billion parameters.
ChatGPT is currently limited in processing a maximum of 5000 tokens of text as input. While this is
not a problem in most applications, it can be challenging in tasks like text summarization.
Moreover, the current interface of ChatGPT does not allow uploading images or audio files.
ChatGPT can produce code representations of visual images based on text prompts, but its drawing
skills are somewhat limited currently (Bang et al., 2023). In this context, researchers have recently
introduced a multi-modal language model trained on multi-modal corpora like image-caption pairs
and can perceive general modalities (Huang et al., 2023). Multi-modal systems can provide

applications, such as image generation from text prompts and stem isolation from pop music5.
ChatGPT has a tendency to give wordy and detailed responses unless explicitly asked not to.
Moreover, ChatGPT expresses fewer emotions than an average human and tends to be more
objective (Guo et al., 2023). Therefore, ChatGPT cannot replace the need for human connection or
be your friend! Similarly, it cannot be used for personal therapy or counseling, which require an
intimate human connection.

CHATTER 5

Conclusion

In conclusion, ChatGPT stands as a remarkable advancement in natural language processing,


showcasing its ability to generate coherent and contextually relevant responses. Its widespread
applications, from assisting users with information to engaging in dynamic conversations, make it a
versatile tool. Despite its strengths, ongoing efforts to enhance its understanding of nuanced queries
and potential biases will be crucial for further refinement. As technology evolves, ChatGPT remains
at the forefront of innovative language models, paving the way for future developments in the field.
21

Firstly, their ability to understand and generate human-like text leads to improved performance in a
multitude of natural language processing tasks, from translation to summarization. Their general-
purpose nature contributes to their versatility, making them applicable across various domains such
as customer service, content creation, programming help, and education. This broad applicability
also means that LLMs can make information and services more accessible, catering to those who
prefer or need to use natural language interfaces. Finally, for individual users, LLMs serve as
powerful learning and productivity tools. They can assist in learning new topics, brainstorming
ideas, and providing writing assistance, effectively becoming an invaluable resource in various
personal and professional endeavors. However, there are many ethical and privacy concerns that
needs to be addressed about ChatGPT (Zhuo et al., 2023). For instance, some users have reported

ChatGPT’s responses containing race and gender bias 6. Moreover, given its effectiveness, ChatGPT
may be used for unethical purposes in education, including cheating. In research, ChatGPT raises
ethical questions about copyright and plagiarism. Given their training on vast amounts of text data
from the internet, LLMs like ChatGPT can generate outputs that resemble or draw heavily from the
works they were trained on, even if they do not directly reproduce these texts

References

1. bert-Aczel, B., & Wagenmakers, E.-J. (2023). Transparency Guidance for ChatGPT Usage in
Scientific Writing. PsyArXiv. https://doi.org/10.31234/osf.io/b58ex

2. Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. In I.


Maglogiannis, L. Iliadis, & E. Pimenidis (Eds.), Artificial Intelligence Applications and
Innovations (pp. 373–383). Springer International Publishing.

3. https://doi.org/10.1007/978-3-030-49186-4_31 Alkaissi, H., & McFarlane, S. I. (2023). Artificial


Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus, 15(2), e35179.
https://doi.org/10.7759/cureus.35179

4. Antaki, F., Touma, S., Milad, D., El-Khoury, J., & Duval, R. (2023). Evaluating the Performance
of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings (p.
2023.01.22.23284882). medRxiv. https://doi.org/10.1101/2023.01.22.23284882
5. Aydın, Ö., & Karaarslan, E. (2022). OpenAI ChatGPT Generated Literature Review: Digital Twin
in Healthcare (SSRN Scholarly Paper No. 4308687). https://doi.org/10.2139/ssrn.4308687
6. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung,
22
W., Do, Q. V., Xu, Y., & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of
ChatGPT on Reasoning, Hallucination, and Interactivity (arXiv:2302.04023). arXiv.
https://doi.org/10.48550/arXiv.2302.04023
7. Basic, Z., Banovac, A., Kruzic, I., & Jerkovic, I. (2023). Better by you, better than me, chatgpt3 as
writing assistance in students essays (arXiv:2302.04536). arXiv.
https://doi.org/10.48550/arXiv.2302.04536
8. Bordt, S., & von Luxburg, U. (2023). ChatGPT Participates in a Computer Science Exam
(arXiv:2303.09461). arXiv. https://doi.org/10.48550/arXiv.2303.09461
9. Borji, A. (2023). A Categorical Archive of ChatGPT Failures (arXiv:2302.03494). arXiv.

10. https://doi.org/10.48550/arXiv.2302.03494

11. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A.,
Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.,
Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., …

12. Amodei, D. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R.
Hadsell, M. F. Balcan, &

13. H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 1877–1901).
Curran Associates, Inc.
https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

14. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., HerVoss, A., Lee, K., Roberts, A., Brown, T.
B.,

23

You might also like