Databricks Notebooks
Databricks Notebooks are the primary interactive development environment on the platform, supporting Python, SQL, Scala, and R in a single document with real-time collaboration, built-in visualisation, and direct access to Spark and Unity Catalog. Notebooks serve as the entry point for data exploration, pipeline prototyping, and ad-hoc analysis, and can be scheduled as production jobs. This page is the pillar overview for the entire Notebooks section.
- Understand the notebook interface and its role in the Databricks workflow
- Learn the capabilities that set Databricks notebooks apart from Jupyter and other tools
- Navigate to focused tutorials on each notebook feature and best practice
Who this is for: Data engineers, analysts, and data scientists who use notebooks for development, exploration, or production workloads on Databricks.
Architecture / Concept Overview: Databricks Notebooks
Databricks Notebooks run on the Databricks control plane and execute code on attached compute resources (clusters, serverless, or SQL warehouses). Each notebook consists of ordered cells that can contain code, markdown, or SQL. The notebook server manages sessions, variables, and state, while the compute layer handles distributed execution.
*Notebooks connect to clusters or serverless compute, accessing data through Unity Catalog.*
Notebooks support multiple languages within a single document using magic commands, making them versatile for mixed workloads.
*A single notebook can contain Python, SQL, Scala, R, and markdown cells with interactive widgets.*
*Notebooks support the full development lifecycle: explore, prototype, schedule, and monitor.*
Key Terms
- Notebook
- An interactive document composed of ordered cells that execute code, display markdown, or render visualisations.
- Cell
- A single executable unit within a notebook that contains code, SQL, or markdown.
- Magic Command
- A cell prefix (e.g.,
%python,%sql,%scala,%r,%md) that overrides the notebook's default language. - Widget
- An interactive input control (text, dropdown, combobox, multiselect) that parameterises notebook execution.
- Notebook Workflow
- Running one notebook from another using
dbutils.notebook.run(), passing parameters and receiving results. - Co-Authoring
- Real-time collaborative editing where multiple users work on the same notebook simultaneously.
Prerequisites and Setup
- A Databricks workspace with notebook access
- Compute resources (cluster, serverless, or SQL warehouse) to attach to
- Basic familiarity with Python, SQL, or another supported language
- Unity Catalog enabled for governed data access
Step-by-Step Implementation
Configuration Reference
| Feature | Description | Where to Learn More |
|---|---|---|
| Multi-language support | Python, SQL, Scala, R in one notebook | Supported Languages tutorial |
| Magic commands | %python, %sql, %run, %pip | Magic Commands tutorial |
| Widgets | Interactive parameters | Widgets tutorial |
| Co-authoring | Real-time multi-user editing | Collaboration tutorial |
| Scheduling | Run notebooks as jobs | Creating and Scheduling tutorial |
| Debugging | Interactive breakpoint debugger | Debugging tutorial |
| Unit testing | Test notebook code with frameworks | Unit Testing tutorial |
| AI assistant | Data Science Agent integration | AI Assistant tutorial |
| Import/Export | DBC, Jupyter, Python formats | Import/Export tutorial |
Monitoring, Cost, and Security Considerations
Monitoring
Notebook execution logs are available in the job run history and through dbutils.notebook.entry_point. Track which notebooks run most frequently and their resource consumption through system tables.
Cost Optimisation
- Notebooks consume compute only while attached and executing. Use auto-termination to stop idle clusters.
- For ad-hoc exploration, use serverless notebooks to avoid paying for cluster idle time.
- Schedule notebooks as jobs on job clusters for lower DBU rates in production.
Security and Governance
- Notebook access is controlled by workspace permissions (can read, can run, can edit, can manage).
- Unity Catalog enforces data access policies regardless of notebook language.
- Use Repos (Git integration) for version control and code review before promotion to production.
Common Pitfalls and Recommended Patterns
- Using notebooks for everything: extract reusable logic into Python modules and test them outside notebooks.
- Not using version control: connect notebooks to Git repos for history, review, and rollback.
- Running all exploration on expensive clusters: use serverless for ad-hoc work.
- Ignoring cell execution order: notebooks maintain state in execution order, not cell position, which can cause confusion.
- Hardcoding parameters: use widgets for runtime parameters instead of editing code.
- Skipping the AI assistant: the Data Science Agent can generate code, explain errors, and suggest optimisations.
Frequently Asked Questions
How are Databricks notebooks different from Jupyter notebooks?
Databricks notebooks offer built-in collaboration, native Spark integration, multi-language support in one notebook, scheduling, and Unity Catalog governance. Jupyter notebooks can be imported into Databricks.
Can I use notebooks for production workloads?
Yes. Notebooks can be scheduled as jobs and monitored through the jobs UI. For complex production pipelines, consider extracting logic into modules and using notebooks as thin orchestration wrappers.
Which language should I use?
Python is the most popular and has the broadest library ecosystem. Use SQL for data analysis and transformations. Scala is useful for performance-critical code. R is best for statistical analysis.
Can I run notebooks from the CLI?
Yes. Use databricks jobs create with a notebook task, or use dbutils.notebook.run() within another notebook.