Unsloth Setup

Setting up Unsloth is straightforward. Let's get your environment ready for fast fine-tuning.

Installation

Standard Installation

pip install unsloth

With Specific CUDA Version

# For CUDA 12.1
pip install unsloth[cu121]

# For CUDA 11.8
pip install unsloth[cu118]

Colab Installation

# Run this cell first in Google Colab
!pip install unsloth

Full Installation with Dependencies

pip install unsloth transformers datasets trl peft accelerate bitsandbytes

Verify Installation

# Check Unsloth installation
import unsloth
print(f"Unsloth version: {unsloth.__version__}")

# Verify FastLanguageModel is available
from unsloth import FastLanguageModel
print("FastLanguageModel imported successfully!")

# Check GPU
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

Loading Models with Unsloth

FastLanguageModel

The core class for loading models:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,
    dtype=None,  # Auto-detect (bfloat16 on modern GPUs)
)

Available Pre-optimized Models

Unsloth provides pre-optimized models on HuggingFace:

# Llama models
"unsloth/Llama-3.2-1B-Instruct"
"unsloth/Llama-3.2-3B-Instruct"
"unsloth/Llama-3.3-70B-Instruct"

# Mistral models
"unsloth/Mistral-7B-Instruct-v0.3"
"unsloth/Mixtral-8x7B-Instruct-v0.1"

# Phi models
"unsloth/Phi-4"

# Qwen models
"unsloth/Qwen2.5-7B-Instruct"
"unsloth/Qwen2.5-72B-Instruct"

# Gemma models
"unsloth/gemma-2-9b-it"

Using Standard HuggingFace Models

You can also use regular HuggingFace model IDs:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="meta-llama/Llama-3.2-3B-Instruct",  # Standard HF ID
    max_seq_length=2048,
    load_in_4bit=True,
)

Adding LoRA with Unsloth

from unsloth import FastLanguageModel

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    bias="none",
    use_gradient_checkpointing="unsloth",  # Unsloth's optimized checkpointing
    random_state=42,
)

Key Configuration Options

max_seq_length

Maximum sequence length for training:

# Short sequences (chat, simple tasks)
max_seq_length = 1024

# Medium sequences (most use cases)
max_seq_length = 2048

# Long sequences (documents, code)
max_seq_length = 4096

load_in_4bit

Enable 4-bit quantization:

# QLoRA (recommended for most cases)
load_in_4bit = True

# Full precision (if you have enough VRAM)
load_in_4bit = False

use_gradient_checkpointing

Unsloth's optimized gradient checkpointing:

# Unsloth optimized (recommended)
use_gradient_checkpointing = "unsloth"

# Standard PyTorch
use_gradient_checkpointing = True

# Disabled (more VRAM, faster)
use_gradient_checkpointing = False

Memory Comparison

Loading Llama 3.2 8B:

Method	VRAM Usage
Standard fp16	16GB
Standard 4-bit	6GB
Unsloth 4-bit	4GB
Unsloth 4-bit + checkpointing	3GB

Complete Setup Example

from unsloth import FastLanguageModel
import torch

# Configuration
model_name = "unsloth/Llama-3.2-3B-Instruct"
max_seq_length = 2048
load_in_4bit = True

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit,
    dtype=None,
)

# Add LoRA
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

# Check model
print(f"Model loaded on: {model.device}")
print(f"Trainable parameters: {model.print_trainable_parameters()}")

# Setup tokenizer
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Troubleshooting

Import Errors

# If unsloth import fails, try:
!pip install --upgrade unsloth

CUDA Errors

# Check CUDA compatibility
import torch
print(torch.cuda.get_arch_list())

# Ensure correct CUDA version
!nvcc --version

Memory Errors

# Reduce sequence length
max_seq_length = 1024

# Enable aggressive checkpointing
use_gradient_checkpointing = "unsloth"

Tip: Always use the unsloth/ prefixed models from HuggingFace when possible - they're pre-optimized for best performance.

Next, let's train a model using Unsloth with SFTTrainer. :::