Training with Unsloth
Unsloth Setup
3 min read
Setting up Unsloth is straightforward. Let's get your environment ready for fast fine-tuning.
Installation
Standard Installation
pip install unsloth
With Specific CUDA Version
# For CUDA 12.1
pip install unsloth[cu121]
# For CUDA 11.8
pip install unsloth[cu118]
Colab Installation
# Run this cell first in Google Colab
!pip install unsloth
Full Installation with Dependencies
pip install unsloth transformers datasets trl peft accelerate bitsandbytes
Verify Installation
# Check Unsloth installation
import unsloth
print(f"Unsloth version: {unsloth.__version__}")
# Verify FastLanguageModel is available
from unsloth import FastLanguageModel
print("FastLanguageModel imported successfully!")
# Check GPU
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
Loading Models with Unsloth
FastLanguageModel
The core class for loading models:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-Instruct",
max_seq_length=2048,
load_in_4bit=True,
dtype=None, # Auto-detect (bfloat16 on modern GPUs)
)
Available Pre-optimized Models
Unsloth provides pre-optimized models on HuggingFace:
# Llama models
"unsloth/Llama-3.2-1B-Instruct"
"unsloth/Llama-3.2-3B-Instruct"
"unsloth/Llama-3.3-70B-Instruct"
# Mistral models
"unsloth/Mistral-7B-Instruct-v0.3"
"unsloth/Mixtral-8x7B-Instruct-v0.1"
# Phi models
"unsloth/Phi-4"
# Qwen models
"unsloth/Qwen2.5-7B-Instruct"
"unsloth/Qwen2.5-72B-Instruct"
# Gemma models
"unsloth/gemma-2-9b-it"
Using Standard HuggingFace Models
You can also use regular HuggingFace model IDs:
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="meta-llama/Llama-3.2-3B-Instruct", # Standard HF ID
max_seq_length=2048,
load_in_4bit=True,
)
Adding LoRA with Unsloth
from unsloth import FastLanguageModel
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-Instruct",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=16,
lora_dropout=0,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
bias="none",
use_gradient_checkpointing="unsloth", # Unsloth's optimized checkpointing
random_state=42,
)
Key Configuration Options
max_seq_length
Maximum sequence length for training:
# Short sequences (chat, simple tasks)
max_seq_length = 1024
# Medium sequences (most use cases)
max_seq_length = 2048
# Long sequences (documents, code)
max_seq_length = 4096
load_in_4bit
Enable 4-bit quantization:
# QLoRA (recommended for most cases)
load_in_4bit = True
# Full precision (if you have enough VRAM)
load_in_4bit = False
use_gradient_checkpointing
Unsloth's optimized gradient checkpointing:
# Unsloth optimized (recommended)
use_gradient_checkpointing = "unsloth"
# Standard PyTorch
use_gradient_checkpointing = True
# Disabled (more VRAM, faster)
use_gradient_checkpointing = False
Memory Comparison
Loading Llama 3.2 8B:
| Method | VRAM Usage |
|---|---|
| Standard fp16 | 16GB |
| Standard 4-bit | 6GB |
| Unsloth 4-bit | 4GB |
| Unsloth 4-bit + checkpointing | 3GB |
Complete Setup Example
from unsloth import FastLanguageModel
import torch
# Configuration
model_name = "unsloth/Llama-3.2-3B-Instruct"
max_seq_length = 2048
load_in_4bit = True
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit,
dtype=None,
)
# Add LoRA
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=16,
lora_dropout=0,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
bias="none",
use_gradient_checkpointing="unsloth",
random_state=42,
)
# Check model
print(f"Model loaded on: {model.device}")
print(f"Trainable parameters: {model.print_trainable_parameters()}")
# Setup tokenizer
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
Troubleshooting
Import Errors
# If unsloth import fails, try:
!pip install --upgrade unsloth
CUDA Errors
# Check CUDA compatibility
import torch
print(torch.cuda.get_arch_list())
# Ensure correct CUDA version
!nvcc --version
Memory Errors
# Reduce sequence length
max_seq_length = 1024
# Enable aggressive checkpointing
use_gradient_checkpointing = "unsloth"
Tip: Always use the
unsloth/prefixed models from HuggingFace when possible - they're pre-optimized for best performance.
Next, let's train a model using Unsloth with SFTTrainer. :::