Git Integration: Version-Controlling Notebooks and Code
Who this is for:
Architecture / Concept Overview: Git Integration: Version-Controlling Notebooks and Code
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
GIT[Git Provider<br/>GitHub / Azure DevOps / GitLab]:::source
REPOS[Databricks Repos]:::ingestion
BRANCH[Feature Branch]:::processing
NB[Notebooks & Code]:::storage
JOBS[Production Jobs]:::serving
CI[CI/CD Pipeline]:::governance
GIT <--> REPOS
REPOS --> BRANCH --> NB
GIT --> CI --> JOBS
*Databricks Repos syncs with remote Git repositories, enabling branch-based development and CI/CD-triggered production deployments.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DEV[Developer]:::source
FEAT[Create Feature Branch]:::ingestion
CODE[Edit Notebooks/Code]:::processing
COMMIT[Commit & Push]:::storage
PR[Pull Request & Review]:::serving
MERGE[Merge to Main]:::governance
DEPLOY[Deploy to Production]:::governance
DEV --> FEAT --> CODE --> COMMIT --> PR --> MERGE --> DEPLOY
*The Git workflow in Databricks follows standard branch-based development with pull requests and code review.*
Key Terms
Prerequisites and Setup
- A Databricks workspace with Repos enabled (enabled by default)
- A Git provider account (GitHub, Azure DevOps, GitLab, Bitbucket)
- A personal access token or OAuth app for Git authentication
- Repository containing notebooks (
.py,.sql,.r,.scala) or.ipynbfiles
Step-by-Step Implementation
Configuration Reference
| Setting | Location | Description |
|---|---|---|
| Git Provider | User Settings | GitHub, Azure DevOps, GitLab, Bitbucket |
| Git Credential | User Settings | PAT or OAuth token for the provider |
| Repo Path | Workspace | Location of the cloned repository |
| Default Branch | Repository settings | Branch used for production references |
| Sparse Checkout | Repo configuration | Clone only specific subdirectories |
| Git Source (Jobs) | Job configuration | Pin job execution to a specific branch/tag/commit |