You are on page 1of 6

Prompt Engineering in Large Language Models

Devansh Bansal
Computer Engineering Department, San Jose State University,
San Jose, USA
Email: devansh.bansal@sjsu.edu

Abstract: Within the dynamic realm of natural these apps and services essentially provide three main
language processing (NLP) and artificial intelligence functionalities:
(AI), large language models (LLMs) have emerged as
revolutionary instruments that have increased the 1. Text Understanding
human-AI interaction manifold. These models have an 2. Content generation
exceptional capacity to produce text that closely 3. Reasoning
resembles human language, utilizing input prompts as a
basis. They can understand information related to any Currently, there are two main problems that researchers are
domain and provide suitable and correct responses, for trying to solve in this domain. The first is that these models
e.g: understanding mathematical formulas, writing code, are very expensive to train and the training data really
summarizing text, and so on. However, the crucial factor affects the performance of these models. So it’s crucial that
in realizing their complete capabilities and guaranteeing the data is clean in all aspects. It should not be biased or
a smooth collaboration between humans and AI is in the incorrect. The second problem is that once these models are
field of prompt engineering, which combines both trained, the only way to interact with them is via text
artistic and scientific elements. The goal of this study is prompts or attachments comprising text files.
to delve into the world of LLMs, understand how they
work, and examine how prompts affect performance in Consider the following two prompts given to ChatGPT
terms of correctness and speed. In this study, we running on GPT 3.5
examine different prompt engineering methods and
propose a layered architecture approach toward
prompting the foundational models such as GPT and
DALLE.

Index Terms—Prompt Engineering; Demystifying Large


Language Models; Improving LLM performance using prompt
enginering; Prompts in LLMs
Prompt 1
I. INTRODUCTION
Recent advances in the fields of Natural Language
Processing and Generative AI have led to the development
of models like GPT-3, DALL-E, Claude, etc. that
demonstrate strong abilities in language understanding, text
generation, and reasoning. However, successfully leveraging Prompt 2
their capabilities is contingent on effective prompt
engineering—the careful construction of text prompts to
provide optimal context and direction. These powerful The responses to the above prompts greatly varied from
models have the potential to improve the user experience in each other. In the first case, ChatGPT gave a very abstract
different areas such as: answer without delving into the critical details, whereas in
the second, the model gave a very specific and convincing
1. Students are using different LLMs as study guides or answer. This sensitivity illustrates the need for further
personal AI tutors. [3] research into systematic and equitable prompt engineering
methods that facilitate more robust human-AI collaboration.
2. Content creators and artists are leveraging models like [1]
DALLE to generate new content and ideas.
Currently, a lack of consensus on standards, best practices,
3. Data engineers are using these models to breeze through and transparency in prompt design presents barriers to
large datasets and texts. real-world deployment. Prior studies have also tended to
focus only on narrow aspects of prompt optimization or
We envision a future comprising of multiple AI agents tasks, with insufficient consideration of fairness and
providing services in specific niches. We will see rise of accountability. Recent analysis reveals high variability in
multiple tools, and application built on the top of the outputs even from the same LLM when prompted
foundational models such as GPT and DALLE. Most of
differently, underscoring the levels of control that prompt A. Non-determinism in LLMs
engineering permits.
Large Language Models (LLMs) possess an inherent degree
In this research, we first study the history of AI evolution, of opacity, akin to a black box, where the intricacies of their
and then we study how LLMs came into being and their functioning are not fully transparent. While their
applications in today’s world. Then we talk about Prompt fundamental operation mirrors the interconnected networks
Engineering and evaluate how it affects the quality of and weighted nodes within the human brain, the precise
response in terms of correctness and speed. mechanism by which these models generate text responses
based on user prompts remains largely uncharted. Our
2. Evolution of AI research endeavors delved into this mystery by employing
varied prompts for a singular use case. Remarkably, we
Humans started talking about AI in the early 20th century. uncovered a robust correlation between the internal thought
At that time, cinema and art used to present this magical processes of these models as they formulate responses and
creature that could think like humans or even better than the dynamics of human-human interaction. This revelation
them. This thinking could materialize into something real underscores the paramount importance of acquiring a
only around the late 1950s. That was the time when the first nuanced understanding of how to pose effective questions to
AI language was also introduced, called LISP, by John LLMs.
McCarty.
One of the techniques that really stood out was the chain of
reasoning. Researchers tried a chain of reasoning to build
the prompts, similar to how humans structure their thoughts
to form sentences. Surprisingly, just by restructuring the
prompts, the performance of the model improved by 60%. It
generated more correct responses with verified facts.

A. Correctness and Reliability of LLMs

In light of the increasing integration of Large Language


Models (LLMs) into practical applications, ensuring the
accuracy and dependability of model outputs is crucial. A
From the 1980s-2011, the world witnessed a couple of
notable challenge in this regard is the occurrence of
breakthroughs in the world of AI. The first expert agent,
hallucinations, where the model interprets user prompts in
known as XCon, was introduced. One of the AI agents beat
unintended ways. Effectively addressing this challenge is
the human world chess champion. These events acted like a
essential for the meaningful use of LLMs across various
fire and ignited the AI boom around 2010. Since then, there
fields. Prompt Engineering emerges as a strategic
has been an exponential advancement in AI, both in terms of
framework to tackle this issue by refining user inputs for
scientific breakthroughs and its adoption in the real world.
more aligned outcomes.
Last year, when OpenAI released its first AI chat application
called ChatGPT, which was powered by its most powerful Researchers have explored diverse architectural approaches
large language model called GPT, the entire world was to alleviate hallucinations. Investigating innovative model
shaken. The app got its first million users in a span of five designs serves as a critical avenue for minimizing
days, the fastest ever in the world’s history. This also led to interpretative deviations in LLM responses. Apart from
a spurt in the growth of research in the area. Many people architectural considerations, two prominent factors
started researching these LLMs, how they work, and how to significantly influencing response quality have surfaced: the
make them work better. This is how Prompt Engineering quality and diversity of training data and the precision of
came into being. Some people even say that Prompts are the input prompts. The interplay between these factors is
new language of today’s world and that humans should use complex, as the model's ability to generate accurate
better prompts while communicating amongst themselves as responses depends on both the depth and variety of its
well. training data and the specificity and clarity of the prompts
encountered in real-world scenarios.
3. Related Work
As researchers work towards optimizing the accuracy and
To understand the current state of work and different reliability of LLMs, the exploration of various architectural
sub-domains in this large domain of artificial intelligence, approaches and a nuanced understanding of the intricate
we studied multiple research papers. From the background relationship between training data and prompts take
research, it was evident that the following four themes have precedence in ongoing investigations. By enhancing insights
direct correlation with the topic of this study and need some into these multifaceted aspects, we can advance effective
attention. Understanding these themes help in navigating the Prompt Engineering strategies, contributing significantly to
large and still immature world of large language models and the broader objective of refining the precision and reliability
generative artificial intelligence. of LLM-generated outputs.

2
C. Prompting techniques and model behavior of guidelines for the users so that they can write better
prompts and get better results.
Within this section, we conducted an in-depth examination
of various prompting techniques proposed by fellow B. Problem Statement
researchers. A noteworthy study in this domain focused on
prompt engineering tailored for text-to-image generative AI In today's AI landscape, prompts play a crucial role. Large
models. The authors of this research undertook a Language Models (LLMs) tend to respond differently to
comprehensive exploration, executing five distinct questions depending on who's asking. This variability can
experiments that varied prompts, random seeds, lead to different user perceptions about the model's
optimization lengths, styles, and subjects. Notably, their performance. We see a lot of potential in these models, and
findings indicated that the incorporation of prompt seeds there are many interesting ways in which we can use these
and iteration lengths in the final prompts resulted in models if we make user interactions simpler and intuitive. In
improved model responses. Moreover, the study highlighted an ideal situation, the model should accurately understand
that diverse styles were interpreted distinctively by the and respond to user questions in a way that feels human. To
models, revealing nuances in the models' responsiveness. achieve this, it's crucial to understand how these models
Subsequently, the researchers proffered insightful prompt function, how they react to different inputs and prompts, and
guidelines, aiming to inform the design of applications the algorithmic tweaks needed to improve response
utilizing such models. [2] accuracy and speed.

D. Applications of LLMs C. Objectives

Our research is based on the assumption that the adoption of This study aims to achieve the following main objectives:
AI in different spheres of human life will increase. In
today’s world, the mobile apps that we use will be taken ● Comprehend the workings and behavior of Large
over by these AI agents, providing specific experiences to Language Models (LLMs) across diverse
the users. scenarios.
● Explore the applications of LLMs to gain a
comprehensive understanding of their practical
uses.
● Investigate the interaction dynamics between
prompts and LLMs.
● Analyze strategies for crafting improved prompts
to enhance model responses.
● Propose guidelines for architecture and design to
facilitate the development of applications that
optimize prompt-based interactions.

D. Motivation

AI has advanced rapidly in the last few years but awareness


and user adoption are still far behind. 80% of the total
human population in the world still doesn’t know about it
and even among the latter 20%, most don’t know how to ask
the right questions to these models. It’s very important to do
research in this area and simultaneously create awareness
3. Project Description and Motivation about it in the community.

A. Purpose and Significance 4. Technical Aspects

In today’s world, AI is significantly changing experiences We propose an iterative, prompt optimization pipeline with
and systems across different spheres. In a couple of years, reinforced active learning for automated generation and
this interaction between humans and AI in different realms testing customized to target outcomes. A model-agnostic
of life will increase much more. It’s important that everyone waveform controller architecture updates prompts based on
is aware of this amazing technology and how to leverage it modular validation signals, allowing rapid tuning without
for their own use cases. LLMs represent a fundamental shift human input. Before discussing the architecture and the
in how we interact with intelligent systems and it’s critical implementation plan, let’s first discuss the foundational
that we well understand how they work to ensure their full models that we will use interchangeably under the hood.
utilization. In this project, we scope our goals to understand
these models from prompting perspective and provide a set

3
A. GPT (Generative Pre-trained Transformer): 4.1 Working of LLMs

GPT, a pioneering model in the landscape of language Large Language Models (LLMs) represent a transformative
models, has significantly contributed to the advancements in leap in natural language processing, with their intricate
Prompt Engineering for Large Language Models (LLMs). architectures enabling them to comprehend and generate
Developed by OpenAI, GPT employs transformer human-like text. A cornerstone in this domain is the
architecture and is pre-trained on massive datasets, enabling transformer architecture, widely adopted in models like
it to generate coherent and contextually relevant text. Its GPT (Generative Pre-trained Transformer). This
ability to understand and respond to natural language architecture relies on self-attention mechanisms, allowing
prompts has made it a cornerstone in the field. Researchers the model to weigh the significance of different words in a
have extensively explored techniques to fine-tune GPT for sequence, fostering improved contextual understanding.
specific tasks, highlighting its flexibility in adapting to
diverse applications. Pre-training is a crucial phase in the development of LLMs.
It involves training the model on massive corpora to learn
linguistic patterns and contextual relationships. This process
B. DALL-E: equips the model with a broad understanding of language,
enabling it to generate coherent text across various contexts.
DALL-E, an innovative model by OpenAI, extends the GPT exemplifies this pre-training paradigm, leveraging a
capabilities of generative models into the realm of images. diverse range of internet texts to acquire language
Unlike traditional language-centric LLMs, DALL-E proficiency.
operates as a generative model for visual content. By
conditioning on textual prompts, DALL-E generates images Fine-tuning is the subsequent step, tailoring the pre-trained
that correspond to the given descriptions. Understanding model for specific tasks. This phase involves exposing the
and manipulating prompts in the visual domain present model to task-specific datasets, allowing it to adapt its
unique challenges, and DALL-E's architecture offers knowledge to particular domains. Researchers have
insights into the intersection of language and image prompt explored diverse fine-tuning strategies to enhance the
engineering. performance of LLMs in applications such as text
completion, summarization, and question-answering.
C. CLAUDE (Contrastive Language-Image
Pre-Training):

CLAUDE represents a novel approach to prompt


engineering by bridging the gap between language and
image modalities. Developed to understand cross-modal
relationships, CLAUDE is pre-trained on datasets
containing both textual and visual information. This model
showcases the potential for prompt engineering in scenarios
where input involves both language and image components.
Analyzing CLAUDE sheds light on strategies to effectively
engineer prompts in multimodal settings.

D. GEMINI (Generative Multimodal Interaction):


LLM server system architecture
GEMINI, a cutting-edge model, stands at the forefront of
multimodal prompt engineering. By integrating information
from both textual and visual prompts, GEMINI aims to 4.2 System architecture and design
produce coherent and contextually relevant responses across
different modalities. The study of GEMINI provides Upon comprehending the intricacies of Large Language
valuable insights into the challenges and opportunities Models (LLMs) and their applications in contemporary
presented by prompt engineering in scenarios requiring settings, exemplified by platforms like ChatGPT and Bard,
interaction between diverse modes of input. the significance of furnishing pertinent parameters within
the user prompt becomes evident. Essential components
Each of these foundational models contributes uniquely to such as context and related data play a pivotal role in
the understanding and advancement of prompt engineering steering the model towards generating contextually relevant
in LLMs. Their architectures and pre-training approaches and accurate responses. In our proposed solution, we
offer valuable insights for researchers aiming to optimize advocate for the incorporation of an additional layer in the
prompt-based interactions in various applications. implementation pipeline. This layer is strategically designed
to refine the user's input, augmenting it with the necessary
metadata, before being transmitted to the underlying LLM.
[4] The objective is to enhance the contextual understanding
of the model by ensuring that vital information is explicitly
provided, thereby optimizing the prompt for more precise

4
and tailored responses. This augmented layer acts as a More formally: Rt = α*R't + (1-α)*R ̃t
crucial intermediary step, fine-tuning the user's input to
better align with the nuanced requirements of the LLM and, Where Rt is the reward at time t; R’t is the Monte Carlo
consequently, fostering improved performance in various rollout estimate; R ̃t is the exponentially smoothed
application scenarios. long-term reward; and α is a tuned weighting coefficient.
This framework outperforms prior programmatic
optimization techniques for the breadth of language use
cases while maximizing safety.

4.4 Safety and Fairness Considerations

We implement proactive measures for risk reduction in


prompts, including

● Toxicity classifiers assess stereotyping,


microaggression, and politeness. [5]
● Entailment detectors identify logically unsupported
claims.
● Confidence estimators score uncertainty levels.
● Adversarial sanity checks with contrastive
candidates and boundary testing.

The documentation required for release also encompasses


fairness assessments like subgroup specificity analysis,
prompt diversity metrics, and intended failure mode
declarations. Structured vetting procedures prior to
deployment are mandated, along with monitoring
requirements.
Addition of Prompt Engineering layer
These codify transparency best practices for ethical prompt
The pipeline centers on an ensemble of parallel Prompt engineering as AI becomes further entwined with decision
Optimizer modules consisting of 1) Prompt Generator 2) systems. Restricting optimization solely for performance
Performance Validators 3) Risk Validators 4) Active risks overlooks complex harms. Our validators and
Learning Controller. This enables both targeted optimization proactive mitigations address this while allowing rapid and
for metrics like relevance as well as holistic safeguards adaptable prompting.
against potential harms.

The prompt generator contains a lattice-based sequencer for 5. Conclusion


creative exploration guided by the controller’s learned
reward patterns. Performance validators then score prompts The use of prompt engineering further refines LLM
on dimensions like semantics, fluency, and accuracy, while behavior. Prompt engineering involves formulating inputs in
risk validators estimate the likelihood of toxicity, bias, and a manner that elicits desired outputs from the model.
deception. The controller aggregates validation signals into Researchers have investigated various prompt engineering
overall reward signals, approving high-quality prompts for techniques to improve response accuracy and relevance.
downstream use while rejecting or regenerating harmful Understanding the inner workings of LLMs is crucial for
ones. Active learning cycles continually expand dataset effective prompt engineering, as it allows researchers to
diversity. Human-in-the-loop options are also available for manipulate inputs strategically.
preference learning.
Despite their remarkable capabilities, LLMs are not devoid
4.3 Prompt Optimization Approach of challenges. Non-determinism in model responses poses a
significant hurdle. Identical prompts may yield different
We employ a reinforced algorithm with Monte Carlo responses due to the model's inherent stochastic nature. This
rollouts for the Controller to learn superior prompt design introduces unpredictability, necessitating careful
patterns, using interim scores from Validators as reward consideration in applications where consistency is
signals. Cross-entropy method: hyperparameter tuning paramount.
prevents overfitting. Transfer learning from human
demonstrations augments policy learning. Interpreting LLM behavior involves delving into the
We also introduce novel Hybrid Reward Smoothing attention mechanisms within the transformer architecture.
regularization, blending both short- and long-term reward Attention maps provide insights into which parts of the
estimates to prevent instability while allowing for sharp input sequence the model focuses on during processing.
optimization. This interpretability aspect is vital for gaining transparency

5
into the decision-making process of LLMs, particularly in 6. Beurer-Kellner, L., Fischer, M. L., & Vechev, M. (2023).
critical domains such as healthcare and legal applications. Prompting is programming: a query language for large
language models. Proceedings of the ACM on Programming
In conclusion, users need to follow a four-step approach Languages, 7(PLDI), 1946-1969
while thinking/formulating prompts: https://doi.org/10.1145/3591300.

1. Ensure that you are setting up as much context as possible 7. M. Kuzlu, Z. Xiao, S. Sarp, F. O. Catak, N. Gurler and O.
in the question Guler, "The Rise of Generative Artificial Intelligence in
2. Ensure you are using correct spelling and grammar. (to Healthcare," 2023 12th Mediterranean Conference on
prevent hallucination) Embedded Computing (MECO), Budva, Montenegro, 2023,
3. Start small and gradually build up the conversation (chain pp. 1-4, doi: 10.1109/MECO58584.2023.10155107.
of reasoning)
4. Imagine you are talking to a human, not AI. 8. J. Mange, "Effect of Training Data Order for Machine
Learning," 2019 International Conference on Computational
Science and Computational Intelligence (CSCI), Las Vegas,
NV, USA, 2019, pp. 406-407, doi:
10.1109/CSCI49370.2019.00078.

9. K. K. Phokela, S. Sikand, K. Singi, K. Dey, V. S. Sharma


and V. Kaulgud, "Smart Prompt Advisor: Multi-Objective
Prompt Framework for Consistency and Best Practices,"
2023 38th IEEE/ACM International Conference on
Automated Software Engineering (ASE), Luxembourg,
Luxembourg, 2023, pp. 1846-1848, doi:
10.1109/ASE56229.2023.00019.

10. Yang, Z. G. and Ligeti-Nagy, N. (2023). Improve


performance of fine-tuning language
models with prompting. Infocommunications Journal,
15(Special Issue), 62-68.
6. References https://doi.org/10.36244/icj.2023.5.10

1. M. K. Pehlivanoğlu, M. A. Syakura and N. Duru, 11. Y. Miao, S. Li, J. Tang and T. Wang, "MuDPT:
"Enhancing Paraphrasing in Chatbots Through Prompt Multi-modal Deep-symphysis Prompt Tuning for Large
Engineering: A Comparative Study on ChatGPT, Bing, and Pre-trained Vision-Language Models," 2023 IEEE
Bard," 2023 8th International Conference on Computer International Conference on Multimedia and Expo (ICME),
Science and Engineering (UBMK), Burdur, Turkiye, 2023, Brisbane, Australia, 2023, pp. 25-30, doi:
pp. 432-437, doi: 10.1109/UBMK59864.2023.10286606. 10.1109/ICME55011.2023.00013.

2. Y. Feng et al., "PromptMagician: Interactive Prompt


Engineering for Text-to-Image Creation," in IEEE
Transactions on Visualization and Computer Graphics, doi:
10.1109/TVCG.2023.3327168.

3. L. Rai, C. Deng and F. Liu, "Developing Massive Open


Online Course Style Assessments using Generative AI
Tools," 2023 IEEE 6th International Conference on
Electronic Information and Communication Technology
(ICEICT), Qingdao, China, 2023, pp. 1292-1294, doi:
10.1109/ICEICT57916.2023.10244824.

4. M. Swaroop, D. Khashabi, C. Baral, Y. Choi, & H.


Hajishirzi, "Reframing instructional prompts to gptk's
language", 2021. https://doi.org/10.48550/arxiv.2109.07830.

5. J. Hernández-Orallo, "AI Safety Landscape From


short-term specific system engineering to long-term
artificial general intelligence," 2020 50th Annual IEEE/IFIP
International Conference on Dependable Systems and
Networks Workshops (DSN-W), Valencia, Spain, 2020, pp.
72-73, doi: 10.1109/DSN-W50199.2020.00023.

You might also like