You are on page 1of 2

Hi,

I have a Nvidia Jetson AGX Orin 64GB which I got wanting to play with AI.

Having watched many YouTube videos which can be referenced to aid instillation and con guration I need
help to get them on the Jetson. I’ve been a Mac user for 15 years and have lots of problems doing anything
on Ubuntu.

Proposed Local AI System Architecture

Speech to text
Whisper X
Voice identi cation of the spoken context is available.

Emotion & Focus


Facial recognition, emotion detection, gaze and heart rate provided by already installed software. To give
understanding of the user’s focus, comprehension and emotional state. The proposed LLM is trained to be
empathetic.

Visual AI
moondream 1.6b

Used for the system to comprehend visual inputs.

Prompt Engine
Guidance
Inputs from speech to text, emotions & focus plus visual AI can be assessed and converted into appropriate
and improved prompts for the LLM.

LLM
ehartford/dolphin-2.2.1-mistral-7b
The model MemGPT advises to use for local LLMs

Quiet-STaR
To encourage the LLM not to rush a response but re ect on its answer.

CRAG / Self RAG


To ensure what is tort is correct.

Storage
The generated LLM output stored for the full session where it can be assessed for improvement before being
given to the user.
fi
fl
fi
Improvement
AlpacaEval 2.0
Reviews all responses generated by the LLM, giving a grade 1-5. The answers to be returned to the prompt
engine if below the desired grade with advice for improvement.

Speech
The system converts the LLM outputs utilising text to speech.

MemGPT
To continuously build an extensive database of the user from the three inputs systems of speech, emotion and
visual. Additionally data of the output the user has received in spoken content.

Speech Quiet-
to text STaR

Prompt CRAG /
Emotion LLM
Engine Self RAG

Visual AI
Improve
Storage
ment

MEM-
Speech GPT

You might also like