Lesson 4 of 24

Understanding Fine-tuning

Choosing Your Approach

3 min read

Now that you understand the different fine-tuning methods, let's build a decision framework to help you choose the right approach for your specific situation.

The Decision Tree

START: Do you need to customize a model?
├─ NO → Use the base model with good prompting
└─ YES → How much VRAM do you have?
         ├─ <8GB → Use API fine-tuning (OpenAI, Together AI)
         ├─ 8-16GB → QLoRA on 7-8B models
         ├─ 16-24GB → QLoRA on 7-70B models
         └─ 24GB+ → LoRA or QLoRA on any model

Key Questions to Ask

1. What's Your Hardware?

VRAM Recommended Approach Max Model Size
8GB QLoRA + Unsloth 7B
16GB QLoRA + Unsloth 13B
24GB QLoRA or LoRA 70B (QLoRA)
48GB+ LoRA or Full FT 70B (LoRA)

2. What's Your Goal?

Goal Recommended Method Why
Teach domain knowledge SFT Trains on instruction-response pairs
Improve response quality SFT + DPO DPO aligns outputs to preferences
Change output format SFT Easy to learn structured outputs
Reduce harmful outputs DPO Preference learning is ideal
Maximum customization Full fine-tuning Updates all parameters

3. How Much Data Do You Have?

Dataset Size Recommendation
<100 examples Use few-shot prompting instead
100-1,000 SFT with LoRA, watch for overfitting
1,000-10,000 SFT with LoRA, optimal range
10,000+ SFT + DPO, or consider full fine-tuning

Practical Recommendations

For Most Users: QLoRA + SFT

# The "safe default" configuration
from peft import LoraConfig
from transformers import BitsAndBytesConfig

# 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# LoRA config
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0
)

Why this works:

  • Fits on consumer GPUs (8-24GB)
  • Excellent quality for most tasks
  • Fast training (hours, not days)
  • Easy to iterate and experiment

For Speed: Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True
)

Why this works:

  • 2x faster than standard training
  • 70% less VRAM usage
  • Same quality results

For Quality: SFT + DPO Pipeline

Step 1: SFT (Supervised Fine-Tuning)
        Train on instruction-response pairs
Step 2: DPO (Direct Preference Optimization)
        Train on preference pairs (chosen/rejected)
Result: Model with both skills AND alignment

Common Mistakes to Avoid

1. Skipping Data Quality

Wrong: "I'll just throw 50,000 examples at it" Right: 1,000 high-quality examples beat 50,000 noisy ones

2. Overfitting on Small Datasets

Wrong: Training for 10 epochs on 500 examples Right: 1-3 epochs, use validation set, monitor loss

3. Wrong Model Size

Wrong: Fine-tuning 70B when 7B would suffice Right: Start small, scale up only if needed

4. Ignoring Base Model Choice

Wrong: Fine-tuning any random model Right: Choose a base model already good at your task type

Quick Start Recommendations

Your Situation Do This
First time fine-tuning QLoRA + Unsloth on Llama 3.2 3B
Need production quality QLoRA on Llama 3.2 8B or Mistral 7B
Limited VRAM QLoRA + Unsloth, gradient checkpointing
Need best possible quality LoRA on 70B or full fine-tune

Summary Checklist

Before you start fine-tuning:

  • Defined clear success metrics
  • Collected 500+ high-quality training examples
  • Chosen base model appropriate for task
  • Verified hardware can handle chosen approach
  • Prepared validation set (10-20% of data)
  • Set up experiment tracking (wandb, mlflow)

In the next module, we'll dive deep into preparing your training dataset—the most critical factor for fine-tuning success. :::

Quiz

Module 1: Understanding Fine-tuning

Take Quiz