Understanding Fine-tuning
Choosing Your Approach
Now that you understand the different fine-tuning methods, let's build a decision framework to help you choose the right approach for your specific situation.
The Decision Tree
START: Do you need to customize a model?
│
├─ NO → Use the base model with good prompting
│
└─ YES → How much VRAM do you have?
│
├─ <8GB → Use API fine-tuning (OpenAI, Together AI)
│
├─ 8-16GB → QLoRA on 7-8B models
│
├─ 16-24GB → QLoRA on 7-70B models
│
└─ 24GB+ → LoRA or QLoRA on any model
Key Questions to Ask
1. What's Your Hardware?
| VRAM | Recommended Approach | Max Model Size |
|---|---|---|
| 8GB | QLoRA + Unsloth | 7B |
| 16GB | QLoRA + Unsloth | 13B |
| 24GB | QLoRA or LoRA | 70B (QLoRA) |
| 48GB+ | LoRA or Full FT | 70B (LoRA) |
2. What's Your Goal?
| Goal | Recommended Method | Why |
|---|---|---|
| Teach domain knowledge | SFT | Trains on instruction-response pairs |
| Improve response quality | SFT + DPO | DPO aligns outputs to preferences |
| Change output format | SFT | Easy to learn structured outputs |
| Reduce harmful outputs | DPO | Preference learning is ideal |
| Maximum customization | Full fine-tuning | Updates all parameters |
3. How Much Data Do You Have?
| Dataset Size | Recommendation |
|---|---|
| <100 examples | Use few-shot prompting instead |
| 100-1,000 | SFT with LoRA, watch for overfitting |
| 1,000-10,000 | SFT with LoRA, optimal range |
| 10,000+ | SFT + DPO, or consider full fine-tuning |
Practical Recommendations
For Most Users: QLoRA + SFT
# The "safe default" configuration
from peft import LoraConfig
from transformers import BitsAndBytesConfig
# 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# LoRA config
lora_config = LoraConfig(
r=16,
lora_alpha=16,
target_modules="all-linear",
lora_dropout=0.0
)
Why this works:
- Fits on consumer GPUs (8-24GB)
- Excellent quality for most tasks
- Fast training (hours, not days)
- Easy to iterate and experiment
For Speed: Unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-Instruct",
max_seq_length=2048,
load_in_4bit=True
)
Why this works:
- 2x faster than standard training
- 70% less VRAM usage
- Same quality results
For Quality: SFT + DPO Pipeline
Step 1: SFT (Supervised Fine-Tuning)
Train on instruction-response pairs
↓
Step 2: DPO (Direct Preference Optimization)
Train on preference pairs (chosen/rejected)
↓
Result: Model with both skills AND alignment
Common Mistakes to Avoid
1. Skipping Data Quality
Wrong: "I'll just throw 50,000 examples at it" Right: 1,000 high-quality examples beat 50,000 noisy ones
2. Overfitting on Small Datasets
Wrong: Training for 10 epochs on 500 examples Right: 1-3 epochs, use validation set, monitor loss
3. Wrong Model Size
Wrong: Fine-tuning 70B when 7B would suffice Right: Start small, scale up only if needed
4. Ignoring Base Model Choice
Wrong: Fine-tuning any random model Right: Choose a base model already good at your task type
Quick Start Recommendations
| Your Situation | Do This |
|---|---|
| First time fine-tuning | QLoRA + Unsloth on Llama 3.2 3B |
| Need production quality | QLoRA on Llama 3.2 8B or Mistral 7B |
| Limited VRAM | QLoRA + Unsloth, gradient checkpointing |
| Need best possible quality | LoRA on 70B or full fine-tune |
Summary Checklist
Before you start fine-tuning:
- Defined clear success metrics
- Collected 500+ high-quality training examples
- Chosen base model appropriate for task
- Verified hardware can handle chosen approach
- Prepared validation set (10-20% of data)
- Set up experiment tracking (wandb, mlflow)
In the next module, we'll dive deep into preparing your training dataset—the most critical factor for fine-tuning success. :::