LLM & Prompt Engineering
Bridging the gap between human intent & AI capability
Shama Ugale
Lead QA Consultant, Thoughtworks
Email: shama.ugale@thoughtworks.com
LinkedIn:
X: https://x.com/UgaleShama
LLM (Large Language Model)
Meet Lalli
When Buddy hears, “Feeling hungry, I
would like to have some…”, She is
statistically more likely to say “biryani”,
“cherries”, or “food” rather than
unrelated words like “bicycle” or “book”
because of exposure frequency.
Introducing LLMs
Computer programs which use neural
networks to predict the next set of words
in a sentence based on historical data
patterns.
For example, a language model trained
on all movie-related Wikipedia articles
can predict movie-related sentences well,
and real-world applications like Gmail
autocomplete rely on such models.
Lalli with divine powers
allowing her to overhear conversations
neighborhood-wide, across schools,
universities, and even globally, she’d have
much broader knowledge
With this massive exposure, Buddy could
now generate predictions about history,
nutrition, or poetry, demonstrating the
broad capabilities
How LLMs learn?
LLMs are trained on vast amounts of text data
from the internet (books, articles, websites,
conversations).
The Goal: To predict the next word in a
sequence. This seems simple but forces the
model to learn grammar, facts, reasoning, and
even some creativity.
Technique: Self-supervised learning – the data
itself provides the supervision.
Example: If the model sees "The cat sat on
the...", it learns that "mat," "rug," or "couch" are
likely next words.
Reinforcement Learning with
Human Feedback (RLHF)
RLHF: a method where humans guide the
model’s responses by labeling outputs as “good”
or “toxic”
The "Transformer" Architecture
(The Engine)
Attention Mechanism. This allows the
model to weigh the importance of
different words in a sentence when
processing others.
Example : Selenium, Model etc
Key LLM Parameters (The Control
Knobs)
● Context Window: The maximum number of
tokens (words or sub-words) the model can
consider at one time.
● Temperature: Controls the randomness and
creativity of the output
● Top-P (Nucleus Sampling): Filters the next
word choices based on a cumulative
probability threshold.
● Top-K: Limits the model's choices to the top k
most probable words for the next token,
regardless of their probability.
● Frequency/Presence Penalty: Discourages the
model from repeating the same words or
phrases.
Prompt Engineering
Prompt Engineering
The Art and Science of Communication
with AI
Guiding AI Behavior
Maximizing AI Value
Key components: Clarity, Context,
Constraints.
Common pitfalls
● Vague prompts → irrelevant output
● No context → misses edge cases
● Overly complex prompts → AI ignores
instructions
● Forgetting QA verification → always
validate AI-generated test cases
The Four Pillars of a Good
Prompt
Persona: Who should the AI pretend to
be? (e.g., "You are a senior QA analyst...")
Task: What exactly do you want it to do?
(e.g., "Your task is to identify security
vulnerabilities...")
Context: What background information
does it need? (e.g., "The website is
example-shop.com...")
Format: How should the output be
structured? (e.g., "Present your findings
in a bulleted list...")
Basic Prompting Demo
Scenario: You're testing an e-commerce login page.
Prompt (Good Example):
"You are a senior QA analyst. Your task
is to identify potential security
vulnerabilities in the login form of an
e-commerce website. The website is
example-shop.com. Analyze the form
for common vulnerabilities like SQL
injection, cross-site scripting (XSS),
and weak password policies. Present
your findings in a bulleted list."
Advanced Prompting Techniques
1. Chain-of-Thought Prompting:
● Encourage AI to "think step-by-step" or
"show its work."
● Improves accuracy and thoroughness for
multi-step problems.
● Excellent for comprehensive test case
generation.
2. Role-Playing / Persona Shifting:
● Assign the AI a specific persona (e.g.,
frustrated user, malicious hacker,
accessibility expert).
● Generates insights from different
perspectives, uncovering hidden issues.
● Crucial for exploratory testing and
empathetic design.
Advanced Prompt Template
Persona: [Role the AI should adopt, e.g., "Senior QA Analyst," "Security Auditor," "End User"]
Context: [System or Feature being tested, e.g., "E-commerce checkout flow," "User registration API," "Mobile banking app's transfer feature"]
Task: [Specific action verb and goal, e.g., "Generate test cases," "Identify vulnerabilities," "Analyze user feedback for patterns"]
Method (Optional but Recommended): [Specify a technique like "Chain-of-Thought," "Role-Play," "A/B comparison"]
Format: [Desired output style, e.g., "Bulleted list," "Table with columns: ID, Description, Expected Result," "JSON," "Markdown narrative"]
Constraints: [Limitations or specific rules, e.g., "No more than 10 test cases," "Exclude scenarios involving credit card numbers," "Focus only on
payment processing"]
Examples (Optional): [Provide input/output examples if the task or format is complex]
Markdown Best Practices for Effective QA Prompts
Headings (#, ##, ###): Organize sections
Bulleted Lists (-): Enumerate test cases
Numbered Lists (1.): Sequential steps
Tables: Structured test cases
Code Blocks (``): Scripts or JSON
Bold/Italics: Highlight constraints
Checklists (- [ ]): Exploratory QA tracking
Hands-On Mini Project:
Post Creation Feature
Your Task: Work with a neighbor. You are
all QA engineers for a new social media
app. The feature under test is post
creation.
Feature Details: Users can write text
(limit 280 chars), upload one image
(JPG/PNG, max 5MB), and add an
optional location.
Prompt
Best Practices & Future Outlook
Iterate and Refine: Your first prompt might not be
perfect. Experiment!
Be Specific, but Concise: Avoid ambiguity, but don't
add unnecessary filler.
Use Examples: If the output format is complex,
provide a small example.
Understand AI Limitations: AI can hallucinate or
generate plausible-sounding but incorrect information.
Always verify critical outputs.
Thank you for your attention
Any Questions?