Check the Spark session
Databricks Notebooks provide an interactive, cell-based development environment where you write and execute code directly against Spark clusters or serverless compute, with built-in visualisation, collaboration, and scheduling. They combine the familiarity of Jupyter-style notebooks with Databricks-specific features like multi-language support, Unity Catalog integration, and real-time co-authoring. Start here to understand the notebook interface and workflow.
- Understand the notebook interface, cell types, and execution model
- Create your first notebook and connect it to compute
- Learn the core actions: run cells, view results, and save work
Who this is for: Anyone new to Databricks who wants to start working with notebooks for data exploration, analysis, or pipeline development.
Part of the Databricks Notebooks section of the Databricks tutorial series.
Architecture / Concept Overview: Check the Spark session
A notebook is a sequence of cells stored in the Databricks workspace. When you attach a notebook to compute and run a cell, the code is sent to the cluster driver, executed (potentially distributed across workers), and the results are returned to the notebook UI. The notebook server maintains session state (variables, imports, temporary views) across cell executions until the session is detached or the cluster restarts.
*Code flows from the notebook cell to the Spark driver, distributes across workers, reads from storage, and returns results.*
The notebook workspace organises notebooks into folders with version history and permissions.
*Notebooks live in workspace folders: personal (/Users), shared (/Shared), or Git-synced (/Repos).*
*The notebook workflow is an iterative loop: edit, run, view results, and iterate.*
Key Terms
- Notebook
- An interactive document of ordered cells for code execution, visualisation, and documentation.
- Cell
- A single executable block containing code, SQL, or markdown.
- Attach
- Connecting a notebook to a compute resource (cluster or serverless) for execution.
- Session
- The runtime state (variables, imports, Spark context) maintained while a notebook is attached to compute.
- Workspace
- The file system within Databricks that stores notebooks, folders, and other assets.
- Revision History
- Automatic version snapshots of notebook content for auditing and rollback.
Prerequisites and Setup
- A Databricks workspace with notebook access
- A running cluster or serverless compute to attach to
- Basic familiarity with Python or SQL
- A web browser (notebooks run in the Databricks web UI)
Step-by-Step Implementation
Configuration Reference
| Feature | Shortcut / Location | Description |
|---|---|---|
| Run cell | Shift + Enter | Execute current cell and move to next |
| Run all | Ctrl + Shift + Enter | Execute all cells in order |
| Add cell | Click + between cells | Insert a new cell |
| Change language | Cell menu → Language | Switch cell to Python, SQL, Scala, R |
| Toggle markdown | %md at cell top | Render cell as markdown |
| Clear state | Detach/Reattach or Clear State | Reset all variables and imports |
| Revision history | File → Revision History | View and restore previous versions |
| Comments | Highlight text → Comment | Add inline comments for collaboration |
Monitoring, Cost, and Security Considerations
Monitoring
Each cell shows execution time and the compute resource used. The notebook activity log records who ran what and when. For scheduled notebooks, job run history provides detailed execution logs.
Cost Optimisation
- Detach notebooks from clusters when not actively working to avoid idle cluster costs.
- Use serverless compute for ad-hoc exploration to eliminate cluster startup and idle time.
- Clear large DataFrames from memory when no longer needed to free executor resources.
Security and Governance
- Notebook permissions control who can view, run, edit, or manage each notebook.
- Unity Catalog enforces data access policies regardless of which notebook runs the query.
- Revision history provides an audit trail of all changes to notebook content.
- Use Repos for Git-based version control and code review workflows.
Common Pitfalls and Recommended Patterns
- Running cells out of order: this creates hidden state dependencies; use "Run All" to verify top-to-bottom execution.
- Not detaching from clusters: leaving a notebook attached keeps the cluster alive and consuming DBUs.
- Hardcoding values: use widgets or parameters instead of editing cell code for different runs.
- Storing sensitive data in notebook cells: use secrets (
dbutils.secrets.get()) instead. - Skipping markdown cells: document your analysis inline for future you and your collaborators.
- Ignoring revision history: use it to roll back unintended changes rather than trying to undo manually.
Frequently Asked Questions
Can I use Databricks notebooks offline?
No. Notebooks require a connection to the Databricks workspace and an attached compute resource to execute code. You can export notebooks for offline viewing.
How do Databricks notebooks compare to Jupyter?
Databricks notebooks offer native Spark integration, multi-language support in one document, real-time collaboration, built-in scheduling, Unity Catalog governance, and the Data Science Agent. Jupyter notebooks can be imported into Databricks.
Can multiple people edit the same notebook at the same time?
Yes. Databricks supports real-time co-authoring where multiple users can edit and run cells simultaneously, similar to Google Docs.
Where are notebooks stored?
Notebooks are stored in the Databricks workspace, which is backed by the control plane. For Git integration, use Repos to sync notebooks with an external Git repository.