Foundation Model Fine-Tuning: Customising LLMs with Your Data
Who this is for:
Architecture / Concept Overview: Foundation Model Fine-Tuning: Customising LLMs with Your Data
Fine-tuning adapts a pre-trained foundation model to your data, producing a custom model that deploys to a serving endpoint.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DATA[Training Data - Delta] -->|Prepare| FORMAT[Chat Format Dataset]
FORMAT -->|Configure| FT_JOB[Fine-Tuning Job]
BASE[Base Foundation Model] -->|Adapt| FT_JOB
FT_JOB -->|Train| CKPT[Checkpoints]
CKPT -->|Register| UC[Unity Catalog]
UC -->|Deploy| EP[Custom Serving Endpoint]
EP -->|Evaluate| EVAL[Quality Evaluation]
DATA:::source
FORMAT:::ingestion
FT_JOB:::processing
BASE:::processing
CKPT:::storage
UC:::governance
EP:::serving
EVAL:::governance
*Fine-tuning pipeline: training data is formatted, the base model is adapted, and the result is deployed as a custom endpoint.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
FT[Fine-Tuning Methods] --> SFT[Supervised Fine-Tuning]
FT --> CPT[Continued Pre-Training]
SFT --> CHAT_FMT[Chat Format - instruction/response]
SFT --> COMPLETION_FMT[Completion Format]
CPT --> DOMAIN[Domain-Specific Corpus]
CPT --> LARGE_SCALE[Large-Scale Text]
FT:::governance
SFT:::processing
CPT:::storage
CHAT_FMT:::ingestion
COMPLETION_FMT:::ingestion
DOMAIN:::source
LARGE_SCALE:::source
*Fine-tuning methods: supervised fine-tuning (SFT) for task alignment and continued pre-training (CPT) for domain knowledge.*
Key Terms
Prerequisites and Setup
- Databricks workspace with Foundation Model Fine-Tuning enabled.
- Training data in a Delta table with the required format.
CREATE MODELprivilege in Unity Catalog.- For large models: sufficient GPU quota in your cloud account.
Step-by-Step Implementation
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
model | — | Base model identifier (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct) |
train_data_path | — | Path to training data in Delta format |
register_to | — | Unity Catalog model name for the output |
training_duration | 1ep | Duration: epochs (1ep) or tokens (1000000tok) |
learning_rate | 5e-6 | Training learning rate |
eval_data_path | None | Optional validation data path |
custom_weights_path | None | Resume from a previous checkpoint |