Lesson 4 of 24

Understanding Fine-tuning

Choosing Your Approach

3 min read

Now that you understand the different fine-tuning methods, let's build a decision framework to help you choose the right approach for your specific situation.

The Decision Tree

START: Do you need to customize a model?
├─ NO → Use the base model with good prompting
└─ YES → How much VRAM do you have?
         ├─ <8GB → Use API fine-tuning (OpenAI, Together AI)
         ├─ 8-16GB → QLoRA on 7-8B models
         ├─ 16-24GB → QLoRA on 7-70B models
         └─ 24GB+ → LoRA or QLoRA on any model

Key Questions to Ask

1. What's Your Hardware?

VRAMRecommended ApproachMax Model Size
8GBQLoRA + Unsloth7B
16GBQLoRA + Unsloth13B
24GBQLoRA or LoRA70B (QLoRA)
48GB+LoRA or Full FT70B (LoRA)

2. What's Your Goal?

GoalRecommended MethodWhy
Teach domain knowledgeSFTTrains on instruction-response pairs
Improve response qualitySFT + DPODPO aligns outputs to preferences
Change output formatSFTEasy to learn structured outputs
Reduce harmful outputsDPOPreference learning is ideal
Maximum customizationFull fine-tuningUpdates all parameters

3. How Much Data Do You Have?

Dataset SizeRecommendation
<100 examplesUse few-shot prompting instead
100-1,000SFT with LoRA, watch for overfitting
1,000-10,000SFT with LoRA, optimal range
10,000+SFT + DPO, or consider full fine-tuning

Practical Recommendations

For Most Users: QLoRA + SFT

# The "safe default" configuration
from peft import LoraConfig
from transformers import BitsAndBytesConfig

# 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# LoRA config
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0
)

Why this works:

  • Fits on consumer GPUs (8-24GB)
  • Excellent quality for most tasks
  • Fast training (hours, not days)
  • Easy to iterate and experiment

For Speed: Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True
)

Why this works:

  • 2x faster than standard training
  • 70% less VRAM usage
  • Same quality results

For Quality: SFT + DPO Pipeline

Step 1: SFT (Supervised Fine-Tuning)
        Train on instruction-response pairs
Step 2: DPO (Direct Preference Optimization)
        Train on preference pairs (chosen/rejected)
Result: Model with both skills AND alignment

Common Mistakes to Avoid

1. Skipping Data Quality

Wrong: "I'll just throw 50,000 examples at it" Right: 1,000 high-quality examples beat 50,000 noisy ones

2. Overfitting on Small Datasets

Wrong: Training for 10 epochs on 500 examples Right: 1-3 epochs, use validation set, monitor loss

3. Wrong Model Size

Wrong: Fine-tuning 70B when 7B would suffice Right: Start small, scale up only if needed

4. Ignoring Base Model Choice

Wrong: Fine-tuning any random model Right: Choose a base model already good at your task type

Quick Start Recommendations

Your SituationDo This
First time fine-tuningQLoRA + Unsloth on Llama 3.2 3B
Need production qualityQLoRA on Llama 3.1 8B or Mistral 7B
Limited VRAMQLoRA + Unsloth, gradient checkpointing
Need best possible qualityLoRA on 70B or full fine-tune

Summary Checklist

Before you start fine-tuning:

  • Defined clear success metrics
  • Collected 500+ high-quality training examples
  • Chosen base model appropriate for task
  • Verified hardware can handle chosen approach
  • Prepared validation set (10-20% of data)
  • Set up experiment tracking (wandb, mlflow)

In the next module, we'll dive deep into preparing your training dataset—the most critical factor for fine-tuning success. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: Understanding Fine-tuning

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.