Git Integration: Version-Controlling Notebooks and Code

    Who this is for:

    Architecture / Concept Overview: Git Integration: Version-Controlling Notebooks and Code

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED GIT[Git Provider<br/>GitHub / Azure DevOps / GitLab]:::source REPOS[Databricks Repos]:::ingestion BRANCH[Feature Branch]:::processing NB[Notebooks & Code]:::storage JOBS[Production Jobs]:::serving CI[CI/CD Pipeline]:::governance GIT <--> REPOS REPOS --> BRANCH --> NB GIT --> CI --> JOBS

    *Databricks Repos syncs with remote Git repositories, enabling branch-based development and CI/CD-triggered production deployments.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED DEV[Developer]:::source FEAT[Create Feature Branch]:::ingestion CODE[Edit Notebooks/Code]:::processing COMMIT[Commit & Push]:::storage PR[Pull Request & Review]:::serving MERGE[Merge to Main]:::governance DEPLOY[Deploy to Production]:::governance DEV --> FEAT --> CODE --> COMMIT --> PR --> MERGE --> DEPLOY

    *The Git workflow in Databricks follows standard branch-based development with pull requests and code review.*

    Key Terms

    Prerequisites and Setup

    • A Databricks workspace with Repos enabled (enabled by default)
    • A Git provider account (GitHub, Azure DevOps, GitLab, Bitbucket)
    • A personal access token or OAuth app for Git authentication
    • Repository containing notebooks (.py, .sql, .r, .scala) or .ipynb files

    Step-by-Step Implementation

      Configuration Reference

      Git Integration: Version-Controlling Notebooks and Code configuration options
      SettingLocationDescription
      Git ProviderUser SettingsGitHub, Azure DevOps, GitLab, Bitbucket
      Git CredentialUser SettingsPAT or OAuth token for the provider
      Repo PathWorkspaceLocation of the cloned repository
      Default BranchRepository settingsBranch used for production references
      Sparse CheckoutRepo configurationClone only specific subdirectories
      Git Source (Jobs)Job configurationPin job execution to a specific branch/tag/commit

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions