.github/workflows/test.yml (example)
Unit testing notebook code on Databricks involves extracting business logic into testable Python functions, then running those tests using frameworks like pytest or unittest either within a notebook or as part of a CI/CD pipeline. By separating transformation logic from notebook orchestration, you can write fast, deterministic tests that catch bugs before they reach production, without requiring a running Spark cluster for every test.
- Extract notebook logic into importable Python modules for testability
- Write and run unit tests using
pytestinside Databricks notebooks - Integrate tests into CI/CD pipelines for automated validation
Who this is for: Data engineers and developers who want to apply software engineering best practices to Databricks notebook code.
Part of the Databricks Notebooks section of the Databricks tutorial series.
Architecture / Concept Overview: .github/workflows/test.yml (example)
The core principle is separating concerns: notebooks orchestrate and visualise, while Python modules contain the business logic. Modules are testable in isolation with mock data, while notebooks handle Spark session management, widget parameters, and display. Tests run in three contexts: interactively in a notebook cell, in a CI job on a Databricks cluster, or locally without Spark using mocked DataFrames.
*Notebooks orchestrate; modules contain logic; tests validate modules; CI/CD automates testing.*
Tests can run at multiple levels, from pure Python unit tests to Spark integration tests.
*The testing pyramid: fast unit tests at the bottom, slower integration tests above, full pipeline tests at the top.*
*Tests run automatically on pull requests, blocking merge until all tests pass.*
Key Terms
- Unit Test
- A test that validates a single function or method in isolation, typically without external dependencies.
- Integration Test
- A test that validates the interaction between components, often requiring a Spark session.
- pytest
- A popular Python testing framework that Databricks supports natively.
- Test Fixture
- A reusable piece of test setup, like a sample DataFrame or SparkSession, provided to test functions.
- Mocking
- Replacing real dependencies (like database connections) with fake implementations for isolated testing.
- CI/CD
- Continuous Integration/Continuous Deployment pipelines that automate testing and deployment.
Prerequisites and Setup
- A Databricks workspace with Repos enabled for code organisation
pytestinstalled on the cluster (pre-installed on modern runtimes)- Understanding of Python testing basics
- Code structured as importable Python modules (not just notebook cells)
Step-by-Step Implementation
Configuration Reference
| Test Type | Requires Spark | Speed | Use Case |
|---|---|---|---|
| Pure Python unit test | No | Fast (ms) | Utility functions, parsing, validation |
| Spark unit test | Yes | Medium (seconds) | DataFrame transforms, SQL logic |
| Integration test | Yes + data | Slow (minutes) | End-to-end pipeline validation |
| Notebook test | Yes | Medium | dbutils.notebook.run() workflows |
Monitoring, Cost, and Security Considerations
Monitoring
Track test results in job run history. Use pytest's JUnit XML output (--junitxml=results.xml) for integration with CI/CD dashboards. Monitor test execution time to catch performance regressions.
Cost Optimisation
- Use single-node clusters (num_workers: 0) for unit tests to minimise DBU consumption.
- Run pure Python tests locally (no cluster) to avoid any Databricks cost.
- Schedule test jobs on spot instances for lower cost.
Security and Governance
- Test data should be synthetic or anonymised; never test against production PII.
- Use separate catalogs or schemas for test data to prevent accidental production writes.
- Store test credentials in Databricks secrets, not in test code.
Common Pitfalls and Recommended Patterns
- Testing logic inside notebook cells: extract into modules first; cells are not importable.
- Skipping tests because "it works in the notebook": production failures from untested code are costly.
- Testing with full production datasets: use small synthetic fixtures for speed and reproducibility.
- Not mocking external dependencies: tests that call APIs or databases are slow and fragile.
- Running tests on multi-node clusters: unit tests need only a single node.
- Forgetting to assert: tests that run code without assertions do not catch bugs.
Frequently Asked Questions
Can I run pytest in a Databricks notebook?
Yes. Use pytest.main() with the test directory path. The test results display in the cell output.
Do I need a running cluster for unit tests?
For pure Python tests (no Spark), you do not need a cluster — run them locally. For tests that use Spark DataFrames, you need a cluster or local Spark session.
How do I test SQL transformations?
Write the SQL as a Spark SQL call in a Python function, then test the function with sample DataFrames using spark.createDataFrame().
Should every notebook have tests?
Focus on testing the business logic (transformation functions, validation rules). Notebooks that only orchestrate (calling modules, displaying results) need less testing. Aim for high coverage on extracted modules.