Notebook cell calls the module

Production-quality Databricks notebooks follow software engineering principles: extract logic into testable modules, use version control via Repos, parameterise with widgets, document with markdown cells, and structure execution for top-to-bottom reproducibility. Treat notebooks as thin orchestration layers that call well-tested library code, not as monolithic scripts with hundreds of cells.

Structure notebooks for readability, reproducibility, and collaboration
Apply software engineering patterns: modularity, testing, version control
Avoid common anti-patterns that lead to maintenance debt

Who this is for: Data engineers, analysts, and data scientists who want to write maintainable, production-ready notebooks on Databricks.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: Notebook cell calls the module

A well-structured notebook separates concerns into layers: parameters (widgets), imports, configuration, transformation logic (from modules), orchestration, and output. Business logic lives in importable Python modules stored in Repos, making it testable, reusable, and reviewable through standard code review workflows.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Params[1. Parameters]:::source --> Imports[2. Imports]:::ingestion Imports --> Config[3. Configuration]:::processing Config --> Logic[4. Transform Logic]:::processing Logic --> Output[5. Output and Display]:::serving Logic --> Modules[Python Modules in Repos]:::governance

*Notebooks follow a standard section order: parameters, imports, configuration, logic, and output.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED NB[Notebook: Thin Wrapper]:::serving --> Lib[Library: Business Logic]:::processing Lib --> Tests[Tests: Validation]:::governance Tests --> CI[CI/CD: Automation]:::governance

*Notebooks are thin wrappers calling tested library code, with CI/CD ensuring quality.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Anti-Pattern Mono[Monolithic Notebook]:::source --> Hard[Hardcoded Values]:::source Hard --> NoTest[No Tests]:::source end subgraph Best Practice Modular[Modular Code]:::serving --> Params[Parameterised]:::serving Params --> Tested[Unit Tested]:::governance end

*Anti-pattern: monolithic, hardcoded, untested. Best practice: modular, parameterised, tested.*

Key Terms

Thin Notebook: A notebook that orchestrates work by calling library functions rather than containing all logic inline.
Repos: Git integration for managing notebook and module source code with version control.
Idempotent: A notebook that produces the same result when run multiple times on the same input data.
Top-to-Bottom Execution: Designing notebooks so every cell runs correctly in sequential order without manual intervention.
Feature Branch: A Git branch used to develop and review changes before merging to the main branch.

Prerequisites and Setup

A Databricks workspace with Repos enabled
A Git repository for source code management
pytest available on the cluster for testing
Familiarity with Python packaging and module imports

Step-by-Step Implementation

Configuration Reference

Notebook cell calls the module configuration options
Practice	Description	Priority
Sections in order	Params → Imports → Config → Logic → Output	High
Extract logic to modules	Importable, testable Python files	High
Use widgets for parameters	No hardcoded dates, tables, or configs	High
Top-to-bottom execution	Every cell runs in order without errors	High
Idempotent writes	Safe to re-run without data corruption	High
Version control via Repos	Git branches, PRs, and code review	High
Markdown documentation	Purpose, owner, schedule, assumptions	Medium
Validation checkpoints	Schema, row count, null checks	Medium
Error handling	Try/except with `dbutils.notebook.exit()`	Medium
Unit tests	pytest for all extracted modules	Medium

Monitoring, Cost, and Security Considerations

Monitoring

Add logging at key checkpoints so job run output provides visibility into what happened. Use dbutils.notebook.exit() with a JSON result string for structured exit status.

Cost Optimisation

- Modular, well-tested code reduces debugging time and failed job re-runs.

- Short, focused notebooks run faster and use less compute.

- Use incremental writes (replaceWhere) instead of full table overwrites to reduce processing time.

Security and Governance

- Never hardcode credentials; use dbutils.secrets.get().

- Use Repos for auditable change history and mandatory code review.

- Store notebooks that access sensitive data in restricted workspace folders with appropriate permissions.

Common Pitfalls and Recommended Patterns

Writing 100+ cell notebooks: keep notebooks under 20-30 cells; extract logic into modules.
Running cells out of order and relying on hidden state: always verify with "Run All".
Hardcoding connection strings, passwords, or table names: use widgets and secrets.
Not documenting the notebook purpose: add a markdown header cell with purpose, owner, and schedule.
Skipping validation: adding assert df.count() > 0 saves hours of debugging downstream issues.
Using workspace folders without Git: Repos provide proper versioning, branching, and code review.
Copy-pasting code between notebooks: extract shared logic into a common module.
Not using exit codes: dbutils.notebook.exit("SUCCESS") provides structured job output.

Frequently Asked Questions

How many cells should a notebook have?

Aim for 10-30 cells. If a notebook exceeds this, extract logic into Python modules. Each cell should have a clear, single purpose.

Should I use notebooks or Python scripts for production?

Use notebooks for orchestration and visualisation, and Python modules for business logic. This gives you the best of both worlds: interactive development with testable, reviewable code.

How do I share code between notebooks?

Use %run for simple cases (utility notebooks) or import from Python modules in Repos for larger codebases. Prefer modules for testability.

Should every notebook have documentation?

Yes. At minimum, include a markdown cell at the top with the notebook's purpose, owner, schedule (if applicable), and key assumptions.