Notebook cell calls the module

Production-quality Databricks notebooks follow software engineering principles: extract logic into testable modules, use version control via Repos, parameterise with widgets, document with markdown cells, and structure execution for top-to-bottom reproducibility. Treat notebooks as thin orchestration layers that call well-tested library code, not as monolithic scripts with hundreds of cells.

  • Structure notebooks for readability, reproducibility, and collaboration
  • Apply software engineering patterns: modularity, testing, version control
  • Avoid common anti-patterns that lead to maintenance debt

Who this is for: Data engineers, analysts, and data scientists who want to write maintainable, production-ready notebooks on Databricks.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: Notebook cell calls the module

A well-structured notebook separates concerns into layers: parameters (widgets), imports, configuration, transformation logic (from modules), orchestration, and output. Business logic lives in importable Python modules stored in Repos, making it testable, reusable, and reviewable through standard code review workflows.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Params[1. Parameters]:::source --> Imports[2. Imports]:::ingestion Imports --> Config[3. Configuration]:::processing Config --> Logic[4. Transform Logic]:::processing Logic --> Output[5. Output and Display]:::serving Logic --> Modules[Python Modules in Repos]:::governance

*Notebooks follow a standard section order: parameters, imports, configuration, logic, and output.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED NB[Notebook: Thin Wrapper]:::serving --> Lib[Library: Business Logic]:::processing Lib --> Tests[Tests: Validation]:::governance Tests --> CI[CI/CD: Automation]:::governance

*Notebooks are thin wrappers calling tested library code, with CI/CD ensuring quality.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Anti-Pattern Mono[Monolithic Notebook]:::source --> Hard[Hardcoded Values]:::source Hard --> NoTest[No Tests]:::source end subgraph Best Practice Modular[Modular Code]:::serving --> Params[Parameterised]:::serving Params --> Tested[Unit Tested]:::governance end

*Anti-pattern: monolithic, hardcoded, untested. Best practice: modular, parameterised, tested.*

Key Terms

Thin Notebook
A notebook that orchestrates work by calling library functions rather than containing all logic inline.
Repos
Git integration for managing notebook and module source code with version control.
Idempotent
A notebook that produces the same result when run multiple times on the same input data.
Top-to-Bottom Execution
Designing notebooks so every cell runs correctly in sequential order without manual intervention.
Feature Branch
A Git branch used to develop and review changes before merging to the main branch.

Prerequisites and Setup

  • A Databricks workspace with Repos enabled
  • A Git repository for source code management
  • pytest available on the cluster for testing
  • Familiarity with Python packaging and module imports

Step-by-Step Implementation

    Configuration Reference

    Notebook cell calls the module configuration options
    PracticeDescriptionPriority
    Sections in orderParams → Imports → Config → Logic → OutputHigh
    Extract logic to modulesImportable, testable Python filesHigh
    Use widgets for parametersNo hardcoded dates, tables, or configsHigh
    Top-to-bottom executionEvery cell runs in order without errorsHigh
    Idempotent writesSafe to re-run without data corruptionHigh
    Version control via ReposGit branches, PRs, and code reviewHigh
    Markdown documentationPurpose, owner, schedule, assumptionsMedium
    Validation checkpointsSchema, row count, null checksMedium
    Error handlingTry/except with dbutils.notebook.exit()Medium
    Unit testspytest for all extracted modulesMedium

    Monitoring, Cost, and Security Considerations

    Monitoring

    Add logging at key checkpoints so job run output provides visibility into what happened. Use dbutils.notebook.exit() with a JSON result string for structured exit status.

    Cost Optimisation

    - Modular, well-tested code reduces debugging time and failed job re-runs.

    - Short, focused notebooks run faster and use less compute.

    - Use incremental writes (replaceWhere) instead of full table overwrites to reduce processing time.

    Security and Governance

    - Never hardcode credentials; use dbutils.secrets.get().

    - Use Repos for auditable change history and mandatory code review.

    - Store notebooks that access sensitive data in restricted workspace folders with appropriate permissions.

    Common Pitfalls and Recommended Patterns

    • Writing 100+ cell notebooks: keep notebooks under 20-30 cells; extract logic into modules.
    • Running cells out of order and relying on hidden state: always verify with "Run All".
    • Hardcoding connection strings, passwords, or table names: use widgets and secrets.
    • Not documenting the notebook purpose: add a markdown header cell with purpose, owner, and schedule.
    • Skipping validation: adding assert df.count() > 0 saves hours of debugging downstream issues.
    • Using workspace folders without Git: Repos provide proper versioning, branching, and code review.
    • Copy-pasting code between notebooks: extract shared logic into a common module.
    • Not using exit codes: dbutils.notebook.exit("SUCCESS") provides structured job output.

    Frequently Asked Questions

    How many cells should a notebook have?

    Aim for 10-30 cells. If a notebook exceeds this, extract logic into Python modules. Each cell should have a clear, single purpose.

    Should I use notebooks or Python scripts for production?

    Use notebooks for orchestration and visualisation, and Python modules for business logic. This gives you the best of both worlds: interactive development with testable, reviewable code.

    How do I share code between notebooks?

    Use %run for simple cases (utility notebooks) or import from Python modules in Repos for larger codebases. Prefer modules for testability.

    Should every notebook have documentation?

    Yes. At minimum, include a markdown cell at the top with the notebook's purpose, owner, schedule (if applicable), and key assumptions.