0% found this document useful (0 votes)
79 views27 pages

Module 1 - LLM Fundamentals

Module 1 of the learning guide focuses on Large Language Models (LLMs) and their applications in global health, covering core concepts, the Transformer architecture, and practical implications. It highlights the capabilities and limitations of LLMs, including their ability to process multimodal data and the importance of understanding context windows. The module also discusses the current LLM ecosystem, detailing proprietary and open-source models, and provides insights for global health consultants on evaluating and implementing LLM technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views27 pages

Module 1 - LLM Fundamentals

Module 1 of the learning guide focuses on Large Language Models (LLMs) and their applications in global health, covering core concepts, the Transformer architecture, and practical implications. It highlights the capabilities and limitations of LLMs, including their ability to process multimodal data and the importance of understanding context windows. The module also discusses the current LLM ecosystem, detailing proprietary and open-source models, and provides insights for global health consultants on evaluating and implementing LLM technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AI Applications

for Global Health


A Practical Learning Guide

MODULE 1: LLM FUNDAMENTALS


& THE EVOLVING AI LANDSCAPE

Nathan Miller, PhD

&
Google Gemini, Google Deep Research,
Google AI Studio, Midjourney, Synthesia,
MAY 2025 Google Flow, Canva
Module 1: LLM Fundamentals &
the Evolving AI Landscape

Welcome to Module 1 of your learning guide to Large Language Models (LLMs) for
global health. This foundational module is designed to equip you with a robust and
current understanding of LLM core concepts, the rapidly evolving AI ecosystem,
frontier developments with a focus on healthcare applications, and strategies for
staying informed. This module will deepen your understanding of LLMs, enabling
you to leverage these technologies effectively and responsibly.

2
1.1: Core LLM Concepts
Understanding the fundamental principles behind LLMs is crucial for appreciating their
capabilities, limitations, and how they can be applied to your work in global health. This section
will break down these core concepts in accessible terms, moving beyond a general background
to an expert-level understanding of their application.

How LLMs Learn and Work

LLMs are a sophisticated type of AI designed to understand, process, and generate human-like
language. Unlike traditional software with explicit programming, LLMs learn by analyzing vast
amounts of text data, identifying intricate patterns, statistical relationships, and contextual
associations between words and phrases. They essentially learn language rules, common
knowledge, and stylistic nuances implicitly from this data. Their core function is predicting the
most likely next word (or token) in a sequence given the preceding context, which allows them
to generate coherent text. This is typically achieved through self-learning or unsupervised
training. It's vital to remember that LLMs are advanced pattern-matching systems, not sentient
entities, which helps set realistic expectations.

The Transformer Architecture: The Engine Behind LLMs

The remarkable capabilities of modern LLMs are largely due to the Transformer architecture,
introduced by Google researchers in their 2017 paper "Attention Is All You Need". This marked a
significant shift from older models like Recurrent Neural Networks (RNNs) that processed
language sequentially, creating bottlenecks for long texts. The Transformer enables parallel
processing, analyzing all words in a sequence simultaneously, which is faster and allows for a
better grasp of word relationships regardless of their position. Many Transformers use an
encoder-decoder structure: the encoder reads the input and creates a numerical
representation of its meaning, and the decoder uses this representation to generate the output.
This parallel processing capability is key to training LLMs on massive datasets.

Attention Mechanisms: How LLMs Focus on What Matters

The "secret sauce" of Transformers is the attention mechanism, particularly self-attention.


Similar to how humans focus on relevant words when reading, attention mechanisms allow
LLMs to weigh the importance of different words in a sequence when processing a specific
word. Self-attention enables the model to look at all other words and calculate "attention
weights," determining how much influence each word should have on the interpretation of the
current word, crucial for understanding long-range dependencies (e.g., linking a pronoun to its
antecedent far back in the text). This often involves assigning Query (Q), Key (K), and Value (V)
vectors to each word's representation. Many Transformers use Multi-Head Attention, running
multiple attention calculations in parallel to capture a more nuanced understanding.

3
Positional Encoding

Since Transformers process words in parallel, they don’t inherently know word order.
Positional encodings are mathematical signals added to each word’s representation to
provide information about its position in the sequence, which is vital for meaning (e.g., “dog
bites man” vs. “man bites dog”).

Tokens: The Basic Units of Text

LLMs don’t process language word by word or character by character; instead, they break text
into smaller units called tokens. A token can be a whole word, part of a word, a single
character, or punctuation. This process, tokenization, allows models to handle large
vocabularies and new words efficiently.

Parameters: The Learned Knowledge

Parameters are the internal variables or “settings” that an LLM learns during training. Think of
them as the connections in the model’s neural network that store the knowledge derived from
data, representing patterns and linguistic rules. Models can have millions to trillions of
parameters; generally, more parameters allow for learning more complex patterns and
potentially better performance, but also mean the model is larger, requires more
computational power, and increases costs. It’s not just about quantity; training data quality and
architecture are also crucial. Users might interact with adjustable parameters like
“Temperature” (randomness/creativity) or “Top-p”.

Context Window: The Model’s “Short-Term Memory”

The context window (or context length) is the amount of text (in tokens) an LLM can consider at
any single point when processing input and generating output. It’s like the model’s short-term
memory. A larger context window means the model can “remember” more of the preceding
text, which is vital for coherence in long dialogues or documents. Leading models have context
windows from 128,000 up to 1 million tokens or more. However, larger windows need more
computational resources and models might struggle to effectively use information in the middle
of very long contexts.

Modality: Beyond Text

While traditional LLMs were text-only, a significant frontier is multimodal models. Modality
refers to the type of data an AI can process or generate, including text, image, audio, video, and
others. Multimodal LLMs (MLLMs) can understand, process, and generate information across
multiple modalities simultaneously. They typically use specialized encoders for each modality,
convert data into common numerical formats (embeddings), and then fuse these
representations for the core LLM backbone to process. This allows for a more holistic
understanding, similar to human senses.

4
Table 1: Core LLM Concepts Explained

Concept Simple Definition Analogy Practical Implication

Token The basic unit of text LEGO brick / Building Affects how text is
(word, part of word, block processed,
character) that an LLM vocabulary size
processes. handling, and context
window limits.

Parameter Internal “settings” or Brain connections / Determines model’s


learned knowledge Learned knowledge capacity for
connections within complexity, nuance,
the LLM’s neural and performance.
network. More parameters
generally mean higher
capability but also
higher cost.

Context Window The amount of text (in Short-term memory / Impacts coherence in
tokens) the LLM can Attention span long
actively consider or texts/conversations,
“remember” at any ability to process large
given time. documents. Larger
windows improve
context but increase
computational cost.

Transformer A neural network Reading a whole page Enables training on


architecture that at once vs. word-by- massive datasets,
enables parallel word handling long-range
processing of entire dependencies in text.
sequences.

Attention A mechanism allowing Focusing on key Improves contextual


the model to weigh words in a sentence understanding and
the importance of relevance of outputs.
different parts of the
input.

Modality The type of data an AI Human senses (sight, Enables models to


model can process or hearing) understand and
generate (e.g., text, integrate information
image, audio). from diverse sources
for richer analysis.

5
Practical Implications: Capabilities and Limitations

These core concepts define LLMs’ capabilities and limitations:

• Capabilities:
○ Enhanced Contextual Understanding: Transformers and attention lead to
coherent outputs.
○ Versatility: LLMs can perform many language tasks (generation, summarization,
Q&A).
○ Handling Complexity: Large parameter counts enable tackling complex tasks.
○ Multimodal Processing: Integrating text, image, audio, and video opens new
possibilities.
○ Efficiency: Parallel processing allows faster training and inference.
• Limitations:
○ Context Window Constraints: Finite context can cause “forgetting” in long texts.
○ Factual Inaccuracy (Hallucinations): LLMs can produce plausible but incorrect
information; verification is essential.
○ Bias: LLMs can inherit and amplify biases from training data.
○ Computational Cost: Large LLMs are expensive to train and run.
○ Lack of True Understanding: LLMs use pattern matching, not genuine
comprehension or common sense.
○ Catastrophic Forgetting: Fine-tuning can make models lose previous skills.
○ Steerability Issues: Models may not always follow instructions perfectly.

Understanding these trade-offs is critical. The power of the Transformer architecture enables
impressive abilities but also brings challenges like cost and ensuring reliability. Outputs require
verification, especially in high-stakes health applications.

Consultant’s Corner: Leveraging Core Concepts


As a global health consultant, understanding these core concepts is not just academic;
it’s strategically vital. When a client organization considers adopting LLM technology:

• Tool Evaluation: Your knowledge of ‘context windows’ will help you assess
if an LLM can handle the length of their project reports or interview
transcripts. For instance, if a client needs to analyze lengthy annual health
reports (e.g., 200 pages), a model with a small context window would be
impractical without significant pre-processing, like breaking documents
into chunks. You can advise on the trade-offs between models with larger
context windows (potentially higher cost, slower) versus pre-processing
strategies.
• Explaining Limitations: You can clearly articulate why an LLM might
‘hallucinate’ (due to its pattern-matching nature, not a knowledge base)

6
and the necessity of human oversight for outputs, especially when dealing
with patient data or public health recommendations.
• Advising on Customization: Understanding ‘parameters’ and ‘fine-tuning’
(which will be covered in more detail later) allows you to discuss the
potential for tailoring models to specific global health terminologies or
contexts, even if recommending no-code/low-code approaches for
implementation.
• Managing Expectations: You can set realistic expectations regarding what
LLMs can and cannot do, preventing disappointment and ensuring the
technology is applied effectively to solve real problems.

Hands-on Practice & Example:

• Learning Objective: To understand the practical implications of context window size in


LLM selection for a common consultancy task.
• Scenario: A client, a global health NGO, wants to use an LLM to summarize 50 project
evaluation reports, each averaging 70 pages (approximately 35,000 words or roughly
47,000 tokens per report, assuming ~1.33 tokens per word). They are considering three
LLMs:
○ LLM A: Max context window of 8,000 tokens.
○ LLM B: Max context window of 128,000 tokens.
○ LLM C: Max context window of 1,000,000 tokens.
• Task:
1. Explain how the ‘context window’ limitation of each LLM would affect its ability
to summarize one entire 70-page report in a single pass.
2. What strategies would you recommend to the client for using LLM A to
summarize these reports?
3. What are the potential trade-offs (e.g., cost, processing time, summary quality)
when comparing the use of LLM B versus LLM C for this task, assuming both can
process the entire document?
• Expected Outcomes/Reflection Questions:
○ The user should identify that LLM A cannot process a whole report at once.
○ The user should suggest strategies like serial summarization of chunks, or a
map-reduce approach for LLM A.
○ The user should discuss that while LLM C offers a larger window, LLM B might be
sufficient and potentially more cost-effective or faster if the entire 1M token
capacity isn’t strictly needed and if models struggle with information in the
middle of very long contexts.
○ How does this understanding directly inform the advice you would give the client
regarding tool selection and workflow design?

7
1.2: The Current LLM Ecosystem
The LLM landscape is diverse and changes rapidly, featuring proprietary systems from major
tech companies and a growing number of powerful open-source alternatives. Understanding
these is crucial for selecting the right tools for global health needs. LLMs can be broadly
categorized into proprietary (closed-source) and open-source models.

• Proprietary (Closed-Source) Models: Developed and controlled by companies like


OpenAI, Google, and Anthropic. Access is typically via APIs or subscriptions, with
limited visibility into their workings.
• Open-Source Models: Architecture and often trained weights are publicly available,
allowing free use, modification, and distribution, fostering community collaboration and
transparency. Platforms like Hugging Face are central repositories.

In-depth Exploration of Leading Proprietary Models

The most prominent proprietary LLMs come from OpenAI (GPT series), Google (Gemini family),
and Anthropic (Claude family).

• GPT Series (OpenAI – e.g., GPT-4, GPT-4o, GPT-4.1):


○ Focus: Pushing performance boundaries in complex reasoning, creativity, and
coding.
○ Features: GPT-4o brought speed, cost-effectiveness, and multimodal
capabilities (text, image, audio). GPT-4.1 further enhanced coding, instruction
following, and expanded context window to 1 million tokens. Access is via API
and services like ChatGPT.
○ Strengths: High performance, strong creative writing, advanced
coding/reasoning (GPT-4.1).
○ Weaknesses: Can be more expensive; earlier versions like GPT-4o sometimes
had “personality” issues; proprietary access.
○ Ideal Use Cases: Complex problem-solving, advanced content creation,
demanding coding tasks.
• Gemini Series (Google – e.g., Gemini 1.5 Pro, Gemini 2.0 Flash, Gemini 2.5 Pro):
○ Focus: Highly capable multimodal models integrated into the Google
ecosystem. Google AI Studio provides access to explore and prototype with
Gemini models.
○ Features: Native multimodality (text, image, audio, video input). Offers versions
optimized for performance (Pro) vs. speed/cost (Flash). Gemini 2.5 Pro has
advanced reasoning and a 1 million token context window.
○ Strengths: Strong multimodal understanding, consistency, accuracy, good
coding/reasoning, excels at understanding local/cultural nuances, integration
with Google tools.
○ Weaknesses: Can be less “playful” than GPT; proprietary access.

8
○ Ideal Use Cases: Multimedia analysis, research requiring high accuracy,
coding, translation/localization.
• Claude Series (Anthropic – e.g., Claude 3 Haiku/Sonnet/Opus, Claude 3.5/3.7
Sonnet):
○ Focus: AI safety, reliability, ethical considerations, steerable outputs.
○ Features: Strong vision (image analysis) capabilities. Offers models balancing
speed, cost, and capability (Haiku: fast/affordable, Sonnet: balanced, Opus:
most capable). Claude 3.7 Sonnet features “extended thinking” for improved
accuracy. Context window around 200,000 tokens; supports agentic workflows.
○ Strengths: High reliability, safety focus, excels at structured tasks, analysis,
summarization, strong coding (especially 3.7 Sonnet), good at handling long
context, clear outputs.
○ Weaknesses: Can sometimes be more verbose or less creative than GPT;
proprietary access.
○ Ideal Use Cases: Enterprise applications needing safety/reliability, processing
long documents, analytical reasoning, coding, content structuring.

Table 2: Comparison of Leading Proprietary LLMs (Mid-2025 Snapshot)


(Note: Capabilities and context windows evolve rapidly. This table reflects information available
around mid-2025 based on cited sources in the Curated Resource Report.)

Feature/Aspect GPT Series (e.g., 4.1 / Gemini Series (e.g., Claude Series (e.g.,
4o) 2.5 Pro / 2.0 Flash) 3.7 / 3.5 Sonnet)

Key Focus Cutting-edge Multimodality, Safety, reliability,


performance, reasoning, Google steerability, analysis,
reasoning, creativity, ecosystem integration coding
coding

Multimodality Strong (Text, Image, Native & Strong (Text, Strong Vision (Text,
Audio input/output – Image, Audio, Video Image input)
GPT-4o/4.1) input)

Noted Strengths Complex reasoning, Accuracy, Reliability, safety,


creativity, advanced consistency, structured tasks, long
coding (4.1) contextual/cultural context, coding
nuance, coding

Noted Weaknesses Higher cost (vs. 4o), Less creative flair (vs. Can be verbose, less
potential personality GPT) creative (vs. GPT)
issues (4o)

Context Window ~1,000,000 tokens ~1,000,000 tokens ~200,000 tokens


(Max) (GPT-4.1) (Gemini 2.5 Pro) (Claude 3 family)

9
Feature/Aspect GPT Series (e.g., 4.1 / Gemini Series (e.g., Claude Series (e.g.,
4o) 2.5 Pro / 2.0 Flash) 3.7 / 3.5 Sonnet)

Ideal Use Cases Complex problem- Multimedia analysis, Enterprise use, long
solving, advanced research, coding, document analysis,
content/code localization coding,
generation summarization

Access Proprietary (API, Proprietary (API, Proprietary (API,


Subscription) Google Cloud, Subscription)
Subscription)

Key Open-Source Models

The open-source LLM movement offers powerful alternatives.

● Llama Series (Meta – e.g., Llama 3, Llama 4):


o Llama 3 models (8B and 70B parameters) showed strong performance, trained
on over 15T tokens with a 128k token tokenizer, good for reasoning and coding.
Llama 4 promises massive context windows (up to 10M tokens). Available under
a permissive community license.
● Gemma Series (Google – e.g., Gemma 2, Gemma 3):
o Lightweight, state-of-the-art open models derived from Gemini research.
Gemma 3 (1B to 27B parameters) is suitable for diverse hardware. Features
include a 128k context window, support for 140+ languages, and function
calling. Specialized variants like CodeGemma (programming), PaliGemma 2
(vision), and ShieldGemma 2 (AI safety) exist. Typically under Apache 2.0
license. (Flan-T5 is an earlier influential open-source model from Google, known
for its instruction fine-tuning capabilities).
● Other Impactful Models:
o Mistral AI: Models like Mistral-8x22b are known for efficiency (Mixture-of-
Experts architecture) and function calling.
o Technology Innovation Institute (TII): Falcon models.
o Alibaba: Qwen series, including large and multimodal variants.
o DeepSeek AI: Models focusing on reasoning and efficiency.
o xAI: Grok (multimodal version available).
o Older models like BLOOM and GPT-NeoX also contribute.
o Vicuna (fine-tuned Llama) is noted for conversational abilities.

Practical Access to Open-Source Models:

● Hugging Face: The central hub for accessing, sharing, and collaborating on open-
source models, datasets, and tools. You can often try models directly on their platform
via “Spaces” (interactive demos).

10
● No-Code/Low-Code Platforms: Several platforms are emerging that provide user-
friendly interfaces to run and fine-tune open-source models without extensive coding.
(Specific examples will be explored in later modules focused on RAG and
customization).
● Local GUIs: Tools like LM Studio allow users to download and run various open-source
models locally on their personal computers (Mac, Windows, Linux) if they have
sufficient processing power (especially GPU) and RAM. This provides a no-code way to
experiment with different models.

Comparative Analysis: Open Source vs. Proprietary

Choosing between open-source and proprietary LLMs involves trade-offs crucial for global
health applications.

Table 3: Open Source vs. Proprietary LLMs: Key Trade-offs

Criteria Open Source Proprietary

Cost Structure Low/No license fees; Higher Usage-based fees or


operational/infrastructure subscriptions; Infrastructure
costs included

Customization Full control; Modify Limited customization;


architecture; Fine-tune on Vendor-dependent features
private data

Performance Varies; Can match/exceed Often leads on general


proprietary on specific tasks benchmarks; Cutting-edge
(fine-tuned) capabilities

Data Control / Privacy Full control; Data stays in- Relies on vendor policies;
house Data typically sent to vendor

Security Responsibility Organization responsible for all Primarily vendor’s


security measures responsibility (with user
config options)

Support Community-driven; Dedicated vendor support;


Documentation; Consultants SLAs; Training programs

Implementation Effort Higher; Requires ML expertise Lower; Easier API integration;


& infrastructure Less infrastructure needed

Strategic Independence High; No vendor lock-in Lower; Dependent on vendor


decisions, pricing, service
continuity

11
For global health, data privacy and control often favor open-source if technical expertise and
resources are available. Cost can also lean towards open-source, considering total cost of
ownership. If top performance and dedicated support are key, and data privacy addressed,
proprietary models may be preferred.

Consultant’s Corner: Advising on LLM Adoption in Global Health


An NGO client approaches you, interested in using an LLM to analyze sensitive patient
feedback from a mental health program in a low-resource setting. They have a limited
budget but a technically capable volunteer IT team.

● Open Source Considerations: You would highlight that an open-source


model, hosted on their own secure server (or a trusted private cloud),
would ensure patient data never leaves their control, crucial for ethical
practice and data privacy regulations. The volunteer IT team could
potentially manage the setup and maintenance. The Llama or Gemma
series, accessed via Hugging Face and run using a tool like LM Studio for
initial exploration or a more robust local deployment, could be viable
starting points.
● Proprietary Considerations: If they lacked strong IT support or needed the
absolute highest performance for complex sentiment analysis out-of-the-
box, you might discuss a proprietary model. However, you would heavily
emphasize the need to scrutinize the vendor’s data usage and privacy
policies (e.g., HIPAA compliance statements if applicable, data processing
locations, whether data is used for future model training). The cost of API
calls for analyzing a large volume of feedback would also need careful
budgeting.
● Recommendation Leaning: Given the data sensitivity and potential for
technical support, you might guide them towards exploring open-source
options first, while ensuring they understand the responsibilities for
security and maintenance. You could suggest a pilot project using a
smaller open-source model to assess feasibility.

Client Advisory Tip: Scrutinize Data Usage Policies


“When considering proprietary LLMs like ChatGPT, Gemini, or Claude for any project
involving potentially sensitive health data – even anonymized or aggregated data – it is
imperative to thoroughly review their data usage and privacy policies. Specifically,
understand where your data will be processed, how it is stored, whether it will be used
to train future versions of their models, and what provisions they offer for data deletion.
For some services, there might be options for ‘zero data retention’ or enterprise-level
agreements with stronger privacy commitments, but these might come at a higher cost.
Always err on the side of caution and assume any data entered could be seen or used by
the provider unless explicitly stated otherwise in a legally binding agreement.”

12
Ethical Consideration: Data Sovereignty in Global Health
The choice between proprietary and open-source LLMs has significant ethical implications for
data sovereignty in global health. Sending health data, even if de-identified, to servers
controlled by commercial entities in other countries can raise concerns about who ultimately
controls that data and under what legal frameworks it is protected. Open-source models, when
deployed within an organization’s own infrastructure or within the country of operation on a
trusted cloud, can offer a way to maintain data sovereignty, aligning better with the health data
governance principles of many nations and communities.

Selection Criteria for Non-Technical Professionals


Non-technical professionals should focus on these practical factors:

1. Define the Task: What will the LLM do (e.g., summarize research, draft proposals,
analyze patient feedback)? Does it need creativity, factual accuracy, medical
knowledge, coding, or image/audio processing?
2. Assess Accuracy and Reliability Needs: How critical is factual correctness? What are
the consequences of errors?
3. Evaluate Performance Requirements (Speed): Does the application need immediate
responses (e.g., patient chatbot) or is some delay acceptable (e.g., report draft)?
4. Consider Data Sensitivity and Privacy: What data will be processed? Does it include
Protected Health Information (PHI)? Are there regulations like GDPR or HIPAA? Does
data need to stay in-house?
5. Analyze Budget and Resources: What’s the budget for fees, hosting, and
infrastructure? What technical expertise is available?
6. Determine Ease of Use and Support Needs: How easily can it integrate into workflows?
How much technical support is needed?
7. Review Licensing and Usage Rights: Are there restrictions on commercial use or
modifications?

There’s no single “best” LLM; the optimal choice is contextual. The field’s rapid evolution
means focusing on these criteria is key.

Table 4: Open Source vs. Proprietary LLMs: Considerations for Selection Criteria

Criteria Open Source Open Source Proprietary Proprietary


(Pros) (Cons) (Pros) (Cons)

Define the Task Flexible Requires Pre-trained for Limited


customization for technical general and some customization;
specific tasks expertise to fine- specialized tasks fixed outputs
(e.g., specialized tune. with high without fine-
medical models). performance. tuning.

13
Criteria Open Source Open Source Proprietary Proprietary
(Pros) (Cons) (Pros) (Cons)

Can be tailored Quality may vary; Typically high Prone to


and fine-tuned community accuracy with hallucinations if
for specific models might not standardized not specifically
accuracy undergo rigorous benchmarks. tuned for
requirements. testing. medical/global
health tasks.

Performance Can be optimized Optimizing Typically Limited control


(Speed) for speed with performance optimized for over
local requires speed through performance
deployment, substantial proprietary adjustments.
reducing latency. technical infrastructure. Cloud-based
expertise. models may
experience
latency.

Data Sensitivity Data can remain Risk of security Usually Data is often
& Privacy entirely within the vulnerabilities if compliant with processed
organization’s not properly data regulations externally, raising
infrastructure. configured. (e.g., HIPAA) and potential privacy
offers secure concerns,
environments. especially with
PHI.

Budget & No licensing fees; Requires in- Support services High licensing
Resources lower cost for house technical and and usage fees,
long-term use. resources for infrastructure particularly for
setup, included in API calls and
maintenance, subscription. large-scale
and fine-tuning. applications.

Ease of Use & Active Lack of official Vendor support Reliance on


Support community support; and dedicated vendor support,
support; open troubleshooting resources for which may be
documentation. relies on troubleshooting slow or costly for
community or in- and integration. specialized use
house expertise. cases.

Licensing & Usually Some models Clearly defined Restrictions on


Usage Rights permissive (e.g., may have commercial use modification;
Apache, MIT), restrictive cases and vendor licensing can

14
Criteria Open Source Open Source Proprietary Proprietary
(Pros) (Cons) (Pros) (Cons)

allowing licenses or accountability for limit use in some


modification and require licensing. global health
commercial use. attribution. applications.

Hands-on Practice & Example:

● Learning Objective: To practice applying selection criteria for LLMs in a global health
context.
● Task: Review the data usage policies of ChatGPT (OpenAI’s general policy for consumer
services), Google’s Gemini (through AI Studio or their general terms), and Anthropic’s
Claude (their standard terms). You can search for “[Tool Name] data usage policy” or
“[Tool Name] privacy policy.”
o Identify one key statement from each policy regarding how user data might be
used by the company (e.g., for model improvement, service provision, etc.).
o Based on these statements, what are the top 1-2 data usage concerns a global
health organization should have before its staff uses the free or standard
versions of these tools for drafting internal project documents that might
contain summarized (but not directly identifiable patient) sensitive
programmatic information (e.g., “challenges in TB adherence in District X
relating to social stigma”)?
● Expected Outcomes/Reflection Questions:
o The user should locate and reference specific policy clauses.
o Concerns might include data being used for model training, data retention
periods, and the geographical location of data processing.
o How would these concerns change if the data involved directly identifiable
patient information?
o What steps could the organization take to mitigate these concerns if they still
wish to leverage these tools? (e.g., stricter internal usage guidelines, exploring
enterprise versions with different data policies).

1.3: Understanding the Frontier: Specialized LLMs and Multi-


Modal LLMs
While general-purpose LLMs are versatile, the cutting edge involves models with deeper
domain expertise and the ability to process information beyond text. These hold particular
promise for global health.

15
Beyond General Purpose: The Need for Specialization

General-purpose LLMs, trained on diverse internet data, are versatile but can lack depth and
accuracy in specialized fields like medicine, which have specific terminology and require high
factual precision. A general model might struggle with nuanced clinical notes without
significant fine-tuning. This drives the development of domain-specific or purpose-built LLMs,
pre-trained or extensively fine-tuned on field-specific data (e.g., biomedical literature, EHRs).
This focused training leads to more accurate and relevant outputs. Another approach is custom
LLMs, where a general model is fine-tuned on an organization’s specific dataset. General
models offer breadth, specialized models offer depth, and custom models offer tailored
performance. For global health, specialized or custom models often outperform general ones
where accuracy and context are paramount.

Spotlight on Healthcare: Specialized LLMs in Action

Healthcare is active in LLM specialization.

● Early Examples (BioBERT, PubMedBERT): Built on BERT, these were pre-trained on


biomedical literature (PubMed abstracts, full-text articles). This improved performance
on tasks like Named Entity Recognition (identifying genes, diseases), Relation Extraction
(gene-disease associations), and Question Answering in biomedical text.
● Advanced Models (Med-PaLM, Med-PaLM 2): Google’s large-scale models for the
medical domain, trained on curated medical data and evaluated against medical
licensing exams (achieving expert-level scores). Med-PaLM is multimodal, designed to
integrate clinical language, medical imaging, and genomic data.
● Other Specialized Efforts: Models integrated into I systems (e.g., Epic with
Microsoft/GPT-4). Google’s AMIE aims for empathetic diagnostic conversations.
Research explores models like Meditron and BioMistral. The goal is often supporting
clinical decision-making, automating tasks, or accelerating research.

Table 4: Overview of Selected Specialized LLMs for Healthcare


(Illustrative examples; many others exist or are in development)

Model Example Primary Training Data Key Developer/Source


Focus Capabilities/Tasks

BioBERT Biomedical Literature NER (Genes, Korea Univ. / NLP Core


(PubMed, PMC) Diseases, etc.), Group
Relation Extraction,
Biomedical QA

Med-PaLM 2 Curated Medical Data Medical QA, Clinical Google


(Text, potentially Decision Support,
Images, Genomics) Documentation

16
Model Example Primary Training Data Key Developer/Source
Focus Capabilities/Tasks

Generation,
Multimodal Analysis

NYUTron I Data (specific Predicting patient NYU Langone Health


institution) readmissions, clinical
outcomes

(Conceptual) I- I Data, Clinical Notes Data trend e.g., Epic / Microsoft


Integrated LLM identification, (GPT-4)
personalized care
suggestions, note
summarization

(Conceptual) Doctor-Patient Generating structured Various (e.g., Elion


Ambient Scribe LLM Conversations, clinical notes from Health, Nym)
Medical Notes conversations

Use Cases in Global Health Research and Operations

LLMs offer numerous applications for medicine and global health:

● Accelerating Drug Discovery: Analyzing literature and datasets to identify drug targets,
predict interactions, and optimize clinical trials. AI agents based on LLMs can
orchestrate discovery workflows.
● Enhancing Diagnostic Support: Assisting healthcare workers in resource-limited
settings by analyzing symptoms, history, and test results (potentially images) to suggest
diagnoses or areas for investigation. This augments, not replaces, clinicians.
● Improving Public Health Surveillance (Infoveillance): Monitoring social media, news,
etc., in multiple languages to detect disease outbreaks, track sentiment on
interventions (e.g., vaccinations), understand health behaviors, and combat
misinformation.
● Synthesizing Research and Evidence: Rapidly processing and summarizing research
papers, reports, and guidelines to help users stay updated and generate evidence-
based recommendations.
● Streamlining Administrative Tasks: Automating tasks like claims processing,
scheduling, drafting correspondence, and generating clinical documentation (e.g.,
ambient scribing from consultations).
● Facilitating Patient Education and Communication: Generating clear explanations of
health conditions/treatments in multiple languages, powering chatbots for health
questions.

17
Multi-Modal LLMs (MLLMs): AI That Sees and Hears

MLLMs process and integrate various data types (text, images, audio, video). They use
specialized encoders for each modality, fuse the representations into a common format,
allowing the core LLM backbone to reason across combined information. This enables tasks
like describing images, answering questions about videos, or generating images from text.
Leading examples include GPT-4o, Claude 3 family, and Gemini series. Open-source examples
include LlaVA and Meta’s ImageBind. MLLMs interact with information more like humans,
leading to more natural interaction and enhanced problem-solving.

Capabilities and Potential Global Health Applications of Multimodal AI

MLLMs offer powerful new avenues for global health:

● Integrated Diagnostics: Analyzing medical images (X-rays, ultrasounds) with clinical


notes or patient history for comprehensive analysis, potentially identifying subtle
correlations and providing robust diagnostic support, especially where specialists are
scarce.
● Telemedicine and Remote Monitoring: Patients could submit images/videos of
symptoms (skin conditions, wounds) with text/audio descriptions for initial assessment,
triage, or remote consultations.
● Analysis of Diverse Health Data: Analyzing survey responses (text), photos of field
conditions (images), audio of community interviews, videos of health worker training to
synthesize insights more effectively.
● Enhanced Health Education and Training: Creating engaging training materials by
combining text, images, diagrams, audio narration, or video demonstrations.
● Accessibility Tools: Describing medical images, charts, or medication labels for
visually impaired individuals.

Successful implementation requires high-quality, diverse training data, robust reliability and
safety methods, privacy adherence, and careful workflow integration. Specialized models offer
depth but may lack breadth. The multimodality of healthcare data makes MLLMs a relevant
frontier but demands careful validation.

Strategic Insight: Proposing Specialized LLM Solutions


As a consultant, understanding specialized LLMs allows you to propose innovative
solutions. For example, a Ministry of Health client struggles with ensuring
consistent application of new national malaria treatment guidelines across diverse
cadres of health workers, many in remote areas with intermittent internet.

● The Challenge: Generic LLMs might not know the specific national
guidelines or might provide information based on other countries’
protocols.

18
● The Specialized Solution Idea: You could propose the development of a
custom, lightweight LLM (or a sophisticated RAG system, which we’ll cover
later) fine-tuned specifically on the country’s official malaria treatment
guidelines, related national health policies, and training materials. This
specialized tool could then be embedded into a simple mobile application
for health workers.
● The Value Proposition: Health workers could query the application in
natural language (even offline, if the model is small enough to be on-
device) to get immediate, accurate advice based only on the approved
national guidelines. This improves adherence, standardizes care, and acts
as a persistent job aid. You would emphasize how this goes beyond what a
general-purpose tool like standard ChatGPT could reliably offer for this
specific, high-stakes use case. This leverages the concept of bringing
focused, expert knowledge directly to the point of care.

Ethical Consideration: Multimodal AI in Global Health


The use of MLLMs in global health introduces unique ethical challenges:

● Data Quality and Bias: Visual and audio data can carry significant biases. For example,
skin condition diagnostic models trained predominantly on lighter skin tones may
perform poorly on darker skin. Ensuring diverse and representative multimodal datasets
is critical but challenging.
● Privacy of Multimodal Data: An image of a patient’s home environment during a
telehealth call, or a recording of their voice, can reveal much more sensitive information
than text alone. Secure collection, storage, and processing protocols are paramount.
● Interpretability and Trust: If an MLLM combines an image and text to suggest a
diagnosis, and it’s incorrect, understanding why it made that error can be even more
complex than with text-only models. Building trust with clinicians and patients requires
transparency about these limitations.

Hands-on Practice & Example:

● Learning Objective: To conceptualize the application of multimodal LLMs to a global


health challenge.
● Scenario: A client is a research institution managing a large, longitudinal child nutrition
study in several rural districts. They have collected:
o Text-based survey data on household food security and child feeding practices.
o Thousands of images of children’s meals taken by caregivers on basic mobile
phones (varying quality).
o Audio recordings of focus group discussions with mothers about barriers to
nutritious feeding.
● Task:
1. How could a multi-modal LLM (conceptually) be used to gain deeper, more
integrated insights from these diverse data types compared to analyzing each

19
one in isolation? For example, could it help link observed feeding practices
(audio/text) with the visual content of meals (images)?
2. What are three key ethical or practical challenges the client would need to
address before implementing such an MLLM-based analysis? (e.g., consent for
image/audio use, image quality issues, potential biases in interpreting meal
content across different cultural food preparations).
● Expected Outcomes/Reflection Questions:
o The user should be able to suggest integrated analyses, e.g., correlating themes
from audio discussions with visual evidence in meal photos and survey data.
o Challenges might include: ensuring informed consent for multimodal data
collection and AI analysis, technical difficulties with variable image quality,
algorithmic bias in identifying food types or portion sizes across diverse local
cuisines, and privacy concerns if images or audio inadvertently capture
identifiable individuals or sensitive household contexts.
o What kind of expertise would the client need on their team or as partners to
responsibly explore this MLLM application?

1.4: Staying Updated: Strategies for Continuous Learning


The field of AI, especially LLMs, advances at an unprecedented pace. New models, capabilities,
applications, and ethical considerations emerge constantly. For global health consultants,
staying informed is essential for effective and responsible leverage of these tools.

Why Continuous Learning is Crucial

AI knowledge quickly becomes outdated. Capabilities that were science fiction a year ago might
be common today. Professionals need awareness of AI developments to understand impacts
on their work and the populations they serve. For global health consultants, this understanding
is vital for identifying opportunities, mitigating risks, and providing informed advice.

Recommended Resources for Non-Technical Professionals


Navigating AI information requires focusing on accessible and relevant resources.

● Accessible Newsletters and Blogs:


o The Rundown: Daily AI news, projects, tools, tutorials.
o Mindstream: Quick daily updates, news, tools, practical tutorials.
o Ben’s Bites: Daily news curation, weekly deep dives into business use cases.
o Superhuman: Focuses on leveraging AI in careers and daily life.
o The Neuron: Daily breakdowns of industry news, tool lists, humor.
o TLDR AI: Daily short summaries of industry news and academic papers.
o Ethan Mollick’s “One Useful Thing” (practical implications, research
translation).

20
o Zvi Mowshowitz’s “Don’t Worry About the Vase” (weekly LLM development
analysis).
● Key Organizations and Research Labs:
o Major AI Labs: OpenAI, Google DeepMind (and Google AI), Meta AI, Anthropic
publish blogs and research.
o Hugging Face: Essential for tracking open-source developments via their blog,
platform updates, and model/dataset repositories.
● Relevant Conferences and Journals (Overview):
o Major AI Conferences (often technical, look for summaries): NeurIPS, ICML,
ICLR.
o Applied Conferences: The AI Summit, World Summit AI.
o Key Journals: Nature Machine Intelligence. For healthcare: NEJM AI, The Lancet
Digital Health.
● Online Communities:
o Hugging Face Hub: Explore models, datasets, “Spaces” (demos).
o Reddit: r/MachineLearning, r/LocalLLaMA, r/huggingface.
o Dedicated Platforms: “God of Prompt” for practical application.
● Key Individuals/Thought Leaders: Follow influential researchers and educators on
LinkedIn or X (formerly Twitter) like Andrew Ng, Yann LeCun, Fei-Fei Li, Cassie Kozyrkov.

Table 5: Recommended Resources for Staying Updated (Examples)

Resource Type Specific Examples Focus / Frequency Accessibility (Non-


Technical)

Newsletters The Rundown, Daily/Weekly AI News, High


Mindstream, Ben’s Tools, Use Cases
Bites, The Neuron

Blogs Ethan Mollick (One Weekly Analysis, High


Useful Thing), Zvi Practical Implications,
Mowshowitz (DWATV) Research Summary

Key Labs (Blogs) OpenAI Blog, Google Major Model Medium


AI Blog, Meta AI Blog, Releases, Research (Announcements
Anthropic News Breakthroughs High)

Communities Hugging Face Hub Model Exploration, Medium to High


(Spaces), Reddit Practical Discussion,
(r/huggingface) Q&A

Conferences NeurIPS, ICML Cutting-Edge Low (Direct) / Medium


(Summaries); AI Research; Industry (Summaries)
Applications

21
Resource Type Specific Examples Focus / Frequency Accessibility (Non-
Technical)

Summit (Applied
Focus)

Journals Nature Machine Peer-Reviewed Low (Direct) / Medium


Intelligence; NEJM AI, Research; AI in Health (Summaries)
Lancet Digital Health Applications

Developing Practical Skills

Staying updated involves developing practical skills:

● Prompt Engineering: Crafting effective instructions (prompts) to guide LLMs.


Understand how to phrase questions, provide context, specify formats, and iterate.
Resources like “Learn Prompting” course can help.
● AI Tool Familiarity: Gain hands-on experience with interfaces like ChatGPT, Gemini,
Google AI Studio, and Claude. Explore AI features in existing tools. Tools like LM Studio
allow running open-source models locally.
● Critical Evaluation: Assess LLM outputs rigorously for accuracy, completeness,
relevance, potential bias, or hallucination. Cross-reference factual claims.
● Data Literacy: Basic understanding of data concepts, how data trains models, and
interpreting AI-presented data.
● Ethical Awareness: Sensitivity to ethical dimensions: bias, privacy, misinformation.
● Collaboration: Learn to work effectively with AI systems and technical teams.
● Adaptability and Lifelong Learning: Commit to continuous learning as tools evolve.

Ethical Considerations When Engaging with LLMs

Ethical vigilance is paramount, especially in global health:

● Bias Awareness: LLMs can perpetuate biases from training data. Be critical of
stereotypical outputs or skewed perspectives.
● Data Privacy: Exercise caution with sensitive information (especially PHI). Understand
provider data policies. Consider solutions with greater data control (e.g., secure open-
source models) for sensitive applications.
● Misinformation and Verification: LLMs can “hallucinate”. Verify critical information
from LLMs against trusted sources before relying on it.
● Transparency and Accountability: LLMs are often “black boxes”. Understand
limitations in tracing outputs and assigning accountability.
● Responsible Use: Avoid generating harmful, deceptive, or abusive content. Consider
real-world impacts. Human oversight (“human-in-the-loop”) is often necessary in high-
stakes situations.

22
Navigating these dimensions is a fundamental professional competence. For global health
consultants, promoting ethical AI use is a core responsibility.

Consultant’s Corner: Building Your “AI Watch” System


As a busy consultant, staying current with AI can feel overwhelming. Here’s a
practical approach to building your personal “AI Watch” system:

1. Curate Your Sources (Start Small): Don’t try to drink from the firehose.
Select 2-3 accessible newsletters (e.g., The Neuron, Ben’s Bites) and one
blog (e.g., One Useful Thing) that resonate with you. Dedicate 15-30
minutes, 2-3 times a week, to scan them.
2. Follow Key Organizations & People: Follow major labs (OpenAI, Google
DeepMind, Meta AI, Anthropic) and a few thought leaders in AI for health on
LinkedIn or X for high-level announcements.
3. Hands-on “Playtime”: Schedule 30-60 minutes bi-weekly or monthly to
experiment with a new feature in a familiar tool (ChatGPT, Gemini, Claude)
or try a new, accessible tool mentioned in your curated sources (e.g., a new
no-code AI platform, or exploring a model on Hugging Face Spaces).
4. Focus on Relevance: As you read or experiment, constantly ask: “How
could this be relevant to [a specific global health challenge, a client type, a
project I’m working on]?” This filters information through your professional
lens.
5. Share & Discuss: If you’re part of a team or community of practice, share
interesting findings or tools. Discussing them deepens understanding. This
structured yet manageable approach helps you stay informed about key
trends and identify emerging AI solutions relevant to global health without
requiring you to become a full-time AI researcher.

Hands-on Practice & Example:

● Learning Objective: To begin curating resources for continuous learning in AI for global
health.
● Task:
1. From the list of “Accessible Newsletters and Blogs” in the Curated Resource
Report (and Table 5 above), select two newsletters that seem most relevant to
your interests. If possible, visit their websites and review their latest few posts or
sign up if they are free.
2. Identify one AI tool, concept, or research finding discussed in these
newsletters/blogs in the last month that you believe could have a tangible
application or implication (positive or negative) for global health consultancy
work.
3. Briefly explain your reasoning: What specific global health problem or
consultancy task could it relate to, and why is it significant?

23
● Expected Outcomes/Reflection Questions:
o The user identifies two specific newsletters/blogs and demonstrates
engagement with their content.
o The user selects a relevant AI development and articulates a clear connection to
global health consultancy, considering its potential impact.
o How might you integrate regular review of such resources into your existing
professional development routine?
o What criteria would you use to decide if a newly announced AI tool is worth
dedicated time to explore further for your consultancy practice?

Module 1 Summary
This module has laid the groundwork for your journey to becoming an expert LLM consultant in
global health. You’ve delved into the core technical concepts that power LLMs, from the
revolutionary Transformer architecture to the practicalities of tokens, parameters, and context
windows. You’ve navigated the current LLM ecosystem, comparing leading proprietary models
like GPT, Gemini, and Claude with the burgeoning world of open-source alternatives such as
Llama and Gemma, and considered the critical factors for selecting the right approach in a
global health context. Furthermore, you’ve explored the exciting frontier of specialized and
multimodal LLMs, recognizing their immense potential to address complex challenges in
healthcare research and operations. Finally, you’ve equipped yourself with strategies and
resources for continuous learning and underscored the paramount importance of ethical
considerations in all AI engagements.

This foundational knowledge is crucial as you move towards more advanced topics in
subsequent modules, where you will learn to apply these tools for producing PhD-level
technical documents, customizing LLMs for specific global health needs, and leveraging AI in
advanced research methods.

Further Resources (Selected from Curated Resource Report):


● Core Concepts:
o “Attention Is All You Need” (Seminal Transformer paper summary, if desired, or
focus on simpler explanations)
o IBM’s explanations of context window and attention mechanism
● LLM Ecosystem & Comparisons:
o Empler.ai’s “Ultimate Guide to the Latest LLMs”
o Blog posts comparing GPT, Gemini, and Claude (e.g., from Nu10, Chatbase)
o Instaclustr’s “Top 10 open source LLMs”
o Google AI Studio (for Gemini exploration)
o Hugging Face (for open-source model exploration)
● Specialized & Multimodal LLMs:

24
o JMIR article: “Revolutionizing Health Care: The Transformative Impact of Large
Language Models in Medicine”
o NVIDIA Glossary: “What Are Multimodal Large Language Models?”
● Staying Updated:
o Newsletters: The Rundown, Ben’s Bites, The Neuron
o Blogs: Ethan Mollick’s “One Useful Thing”

25
26
Nathan Miller
natemiller33@gmail.com
27

You might also like