Professional Documents
Culture Documents
Alpaca + Llama-3 8b full example.ipynb - Colab
Alpaca + Llama-3 8b full example.ipynb - Colab
ipynb - Colab
To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!
To install Unsloth on your own computer, follow the installation instructions on our Github page
here.
You will learn how to do data prep, how to train, how to run the model, & how to save it (eg for
Llama.cpp).
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100,
!pip install --no-deps packaging ninja einops flash-attn xformers trl peft accel
else:
# Use this for older GPUs (V100, Tesla T4, RTX 20xx)
!pip install --no-deps xformers trl peft accelerate bitsandbytes
pass
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 1/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
We now add LoRA adapters so we only need to update 1 to 10% of all parameters!
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 2/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long contex
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
[NOTE] To train only on completions (ignoring the user's input) read TRL's docs here.
[NOTE] Remember to add the EOS_TOKEN to the tokenized output!! Otherwise you'll get infinite
generations!
If you want to use the ChatML template for ShareGPT datasets, try our conversational notebook.
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 3/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
### Instruction:
{}
### Input:
{}
### Response:
{}"""
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 4/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 60,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
显示代码
trainer_stats = trainer.train()
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 5/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
1 1.814600
2 2.293200
3 1.689600
4 1.952800
5 1.645600
6 1.639600
7 1.217400
8 1.246600
9 1.069000
10 1.174000
11 0.963300
12 0.991200
13 0.929300
14 1.050700
15 0.883800
16 0.877100
17 1.006000
18 1.264100
19 0.988600
20 0.885600
21 0.921400
22 0.992100
23 0.859300
24 0.983200
25 1.070400
26 1.012900
27 1.049900
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 6/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 7/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
keyboard_arrow_down Inference
Let's run the model! You can change the instruction and input - leave the output blank!
You can also use a TextStreamer for continuous inference - so you can see the generation token
by token, instead of waiting the whole time!
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 8/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
### Instruction:
Continue the fibonnaci sequence.
### Input:
1, 1, 2, 3, 5, 8
### Response:
13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 463
[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF,
scroll down!
Now if you want to load the LoRA adapters we just saved for inference, set False to True :
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 9/10
2024/4/25 19:13 Alpaca + Llama-3 8b full example.ipynb - Colab
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"What is a famous tall tower in Paris?", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
You can also use Hugging Face's AutoModelForPeftCausalLM . Only use this if you do not have
unsloth installed. It can be hopelessly slow, since 4bit model downloading is not supported, and
Unsloth's inference is 2x faster.
if False:
# I highly do NOT suggest - use Unsloth if possible
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
"lora_model", # YOUR MODEL YOU USED FOR TRAINING
load_in_4bit = load_in_4bit,
)
tokenizer = AutoTokenizer.from_pretrained("lora_model")
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing#Train= 10/10