Creating, Running, and Scheduling Notebooks

Databricks notebooks move seamlessly from interactive development to automated production by combining cell-based execution with built-in job scheduling. You create notebooks in the workspace or via Git repos, run them interactively by attaching to compute, and schedule them as recurring jobs with alerting and retry logic — all without leaving the platform. This workflow eliminates the gap between exploration and production.

Create notebooks from the UI, CLI, or Git repos
Run notebooks interactively and programmatically using dbutils.notebook.run()
Schedule notebooks as jobs with triggers, retries, and notifications

Who this is for: Data engineers and analysts who need to move notebooks from development to scheduled production execution.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: Creating, Running, and Scheduling Notebooks

The notebook lifecycle starts with creation in the workspace, moves through interactive development, and ends with scheduled execution as a job. Jobs wrap notebooks with scheduling, compute provisioning, retry logic, and alerting. A job can contain multiple notebook tasks arranged in a DAG for complex workflows.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Create[Create Notebook]:::source --> Develop[Interactive Development]:::processing Develop --> Test[Test and Validate]:::processing Test --> Schedule[Schedule as Job]:::serving Schedule --> Monitor[Monitor Runs]:::governance

*Notebooks progress from creation through interactive development, testing, and scheduled production execution.*

Jobs can orchestrate multiple notebooks as tasks in a directed acyclic graph (DAG).

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Trigger[Schedule Trigger]:::governance --> Ingest[Ingest Notebook]:::ingestion Ingest --> Transform[Transform Notebook]:::processing Transform --> Validate[Validate Notebook]:::processing Validate --> Publish[Publish Notebook]:::serving

*A job DAG orchestrates multiple notebooks as sequential or parallel tasks with dependencies.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Cron[Cron Schedule]:::governance --> Job[Job Run]:::processing FileArr[File Arrival]:::source --> Job Manual[Manual Trigger]:::source --> Job Job --> Cluster[Job Cluster]:::processing Cluster --> NB[Notebook Execution]:::serving

*Jobs can be triggered by cron schedules, file arrival events, or manual triggers.*

Key Terms

Job: A managed Databricks resource that runs one or more tasks (notebooks, scripts, or pipelines) on a schedule or trigger.
Task: A single unit of work within a job, typically a notebook execution with parameters.
Trigger: The condition that starts a job run: cron schedule, file arrival, continuous, or manual.
dbutils.notebook.run(): A method to run one notebook from another, passing parameters and receiving a string result.
Job Cluster: An ephemeral cluster created specifically for a job run and terminated after completion.
Retry Policy: Configuration that automatically re-runs failed tasks a specified number of times.

Prerequisites and Setup

A Databricks workspace with notebook and job creation permissions
A compute resource for interactive development
Understanding of cron expressions for scheduling
Notebook code ready for parameterisation via widgets or task values

Step-by-Step Implementation

Configuration Reference

Creating, Running, and Scheduling Notebooks configuration options
Setting	Description	Example Value
`notebook_path`	Path to the notebook in workspace	`/Repos/team/etl/transform`
`base_parameters`	Key-value parameters passed to the notebook	`{"date": "2024-12-01"}`
`quartz_cron_expression`	Cron schedule	`0 0 6 * * ?` (daily at 6 AM)
`timezone_id`	Schedule timezone	`Europe/London`
`max_retries`	Number of automatic retries on failure	`2`
`min_retry_interval_millis`	Wait between retries	`60000` (1 minute)
`timeout_seconds`	Maximum runtime before killing	`3600` (1 hour)
`email_notifications`	Alert recipients	`on_failure`, `on_success`

Monitoring, Cost, and Security Considerations

Monitoring

Job run history shows status, duration, and output for each run. Use email or webhook notifications for failure alerts. Query system.workflow.jobs for cross-workspace job analytics.

Cost Optimisation

- Use job clusters instead of all-purpose clusters for scheduled notebooks to get lower DBU rates and auto-termination.

- Set timeout_seconds to prevent runaway jobs from consuming excessive compute.

- Schedule during off-peak hours when cloud spot pricing is lower.

Security and Governance

- Jobs run as the job owner's identity for Unity Catalog access control.

- Use service principals as job owners for production workloads to decouple from individual accounts.

- Store sensitive parameters in Databricks secrets rather than job configurations.

Common Pitfalls and Recommended Patterns

Running production notebooks on all-purpose clusters: use job clusters for lower cost and guaranteed termination.
Hardcoding parameters: use widgets or base_parameters for runtime configuration.
Not setting timeouts: a stuck notebook can run (and cost) indefinitely without a timeout.
Skipping retry configuration: transient failures (network, spot preemption) are common; set 1-2 retries.
Using dbutils.notebook.run() for complex DAGs: use job task dependencies instead for better visibility and error handling.
Not testing with "Run Now" before scheduling: always verify a manual run succeeds before enabling the cron schedule.

Frequently Asked Questions

Can I pass parameters to a scheduled notebook?

Yes. Use base_parameters in the job task configuration. Access them in the notebook with dbutils.widgets.get("param_name") or through task values.

What happens if a scheduled run fails?

If retry is configured, the task automatically retries up to max_retries times. If all retries fail, the job run is marked as failed and notifications are sent.

Can I schedule a notebook to run on a SQL warehouse?

Yes. Use a SQL task in the job definition to run SQL files or queries on a SQL warehouse. For notebook tasks, attach to a cluster or serverless compute.

How do I chain multiple notebooks in sequence?

Create a job with multiple tasks and define dependencies between them. Task B depends on Task A, so B only runs after A succeeds.