Data Warehousing with Databricks SQL

    Who this is for:

    Architecture / Concept Overview: Data Warehousing with Databricks SQL

    The Databricks SQL stack sits on top of Delta Lake and Unity Catalog, exposing a serverless compute layer to analysts, engineers, and BI tools alike.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED A[External Sources] -->|Ingest| B[Auto Loader / COPY INTO] B --> C[Delta Lake Storage] C --> D[Unity Catalog Governance] D --> E[SQL Warehouse Compute] E --> F[SQL Editor / Dashboards / BI Tools] A:::source B:::ingestion C:::storage D:::governance E:::processing F:::serving

    *Figure 1 — End-to-end data warehousing flow from ingestion through governed compute to analyst consumption.*

    The SQL warehouse compute layer itself has three deployment modes — each targeting a different workload profile.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ROOT[SQL Warehouse Types] --> SL[Serverless] ROOT --> PRO[Pro] ROOT --> CL[Classic] SL --> SL1[Instant start · auto-scales · managed by Databricks] PRO --> PRO1[Customer-managed VPC · predictive scaling] CL --> CL1[Legacy · fixed clusters · manual scaling] ROOT:::governance SL:::serving PRO:::processing CL:::source SL1:::serving PRO1:::processing CL1:::source

    *Figure 2 — SQL warehouse deployment modes and their primary characteristics.*

    Key Terms

    Prerequisites and Setup

    • A Databricks workspace on AWS, Azure, or GCP with Unity Catalog enabled
    • At least one SQL warehouse provisioned (serverless recommended)
    • CAN USE privilege on the target SQL warehouse
    • USE CATALOG and USE SCHEMA privileges on data objects you plan to query
    • Familiarity with standard SQL (ANSI SQL:2011 dialect)

    Step-by-Step Implementation

      Configuration Reference

      Data Warehousing with Databricks SQL configuration options
      SettingScopeRecommended Value
      Warehouse typeWorkspaceServerless (default)
      Auto-stop timeoutWarehouse10 minutes for dev, 30 for production
      Max clustersWarehouse1 for dev, match peak concurrency for prod
      Spot policyWarehouse (Pro/Classic)Cost-optimised for non-SLA workloads
      PhotonWarehouseEnabled (default on serverless)
      Query result cacheWarehouseEnabled
      Unity Catalog metastoreAccountOne per region
      Default catalogWorkspaceSet to team catalog

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions