Creating, Running, and Scheduling Notebooks

Databricks notebooks move seamlessly from interactive development to automated production by combining cell-based execution with built-in job scheduling. You create notebooks in the workspace or via Git repos, run them interactively by attaching to compute, and schedule them as recurring jobs with alerting and retry logic — all without leaving the platform. This workflow eliminates the gap between exploration and production.

  • Create notebooks from the UI, CLI, or Git repos
  • Run notebooks interactively and programmatically using dbutils.notebook.run()
  • Schedule notebooks as jobs with triggers, retries, and notifications

Who this is for: Data engineers and analysts who need to move notebooks from development to scheduled production execution.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: Creating, Running, and Scheduling Notebooks

The notebook lifecycle starts with creation in the workspace, moves through interactive development, and ends with scheduled execution as a job. Jobs wrap notebooks with scheduling, compute provisioning, retry logic, and alerting. A job can contain multiple notebook tasks arranged in a DAG for complex workflows.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Create[Create Notebook]:::source --> Develop[Interactive Development]:::processing Develop --> Test[Test and Validate]:::processing Test --> Schedule[Schedule as Job]:::serving Schedule --> Monitor[Monitor Runs]:::governance

*Notebooks progress from creation through interactive development, testing, and scheduled production execution.*

Jobs can orchestrate multiple notebooks as tasks in a directed acyclic graph (DAG).

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Trigger[Schedule Trigger]:::governance --> Ingest[Ingest Notebook]:::ingestion Ingest --> Transform[Transform Notebook]:::processing Transform --> Validate[Validate Notebook]:::processing Validate --> Publish[Publish Notebook]:::serving

*A job DAG orchestrates multiple notebooks as sequential or parallel tasks with dependencies.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Cron[Cron Schedule]:::governance --> Job[Job Run]:::processing FileArr[File Arrival]:::source --> Job Manual[Manual Trigger]:::source --> Job Job --> Cluster[Job Cluster]:::processing Cluster --> NB[Notebook Execution]:::serving

*Jobs can be triggered by cron schedules, file arrival events, or manual triggers.*

Key Terms

Job
A managed Databricks resource that runs one or more tasks (notebooks, scripts, or pipelines) on a schedule or trigger.
Task
A single unit of work within a job, typically a notebook execution with parameters.
Trigger
The condition that starts a job run: cron schedule, file arrival, continuous, or manual.
dbutils.notebook.run()
A method to run one notebook from another, passing parameters and receiving a string result.
Job Cluster
An ephemeral cluster created specifically for a job run and terminated after completion.
Retry Policy
Configuration that automatically re-runs failed tasks a specified number of times.

Prerequisites and Setup

  • A Databricks workspace with notebook and job creation permissions
  • A compute resource for interactive development
  • Understanding of cron expressions for scheduling
  • Notebook code ready for parameterisation via widgets or task values

Step-by-Step Implementation

    Configuration Reference

    Creating, Running, and Scheduling Notebooks configuration options
    SettingDescriptionExample Value
    notebook_pathPath to the notebook in workspace/Repos/team/etl/transform
    base_parametersKey-value parameters passed to the notebook{"date": "2024-12-01"}
    quartz_cron_expressionCron schedule0 0 6 * * ? (daily at 6 AM)
    timezone_idSchedule timezoneEurope/London
    max_retriesNumber of automatic retries on failure2
    min_retry_interval_millisWait between retries60000 (1 minute)
    timeout_secondsMaximum runtime before killing3600 (1 hour)
    email_notificationsAlert recipientson_failure, on_success

    Monitoring, Cost, and Security Considerations

    Monitoring

    Job run history shows status, duration, and output for each run. Use email or webhook notifications for failure alerts. Query system.workflow.jobs for cross-workspace job analytics.

    Cost Optimisation

    - Use job clusters instead of all-purpose clusters for scheduled notebooks to get lower DBU rates and auto-termination.

    - Set timeout_seconds to prevent runaway jobs from consuming excessive compute.

    - Schedule during off-peak hours when cloud spot pricing is lower.

    Security and Governance

    - Jobs run as the job owner's identity for Unity Catalog access control.

    - Use service principals as job owners for production workloads to decouple from individual accounts.

    - Store sensitive parameters in Databricks secrets rather than job configurations.

    Common Pitfalls and Recommended Patterns

    • Running production notebooks on all-purpose clusters: use job clusters for lower cost and guaranteed termination.
    • Hardcoding parameters: use widgets or base_parameters for runtime configuration.
    • Not setting timeouts: a stuck notebook can run (and cost) indefinitely without a timeout.
    • Skipping retry configuration: transient failures (network, spot preemption) are common; set 1-2 retries.
    • Using dbutils.notebook.run() for complex DAGs: use job task dependencies instead for better visibility and error handling.
    • Not testing with "Run Now" before scheduling: always verify a manual run succeeds before enabling the cron schedule.

    Frequently Asked Questions

    Can I pass parameters to a scheduled notebook?

    Yes. Use base_parameters in the job task configuration. Access them in the notebook with dbutils.widgets.get("param_name") or through task values.

    What happens if a scheduled run fails?

    If retry is configured, the task automatically retries up to max_retries times. If all retries fail, the job run is marked as failed and notifications are sent.

    Can I schedule a notebook to run on a SQL warehouse?

    Yes. Use a SQL task in the job definition to run SQL files or queries on a SQL warehouse. For notebook tasks, attach to a cluster or serverless compute.

    How do I chain multiple notebooks in sequence?

    Create a job with multiple tasks and define dependencies between them. Task B depends on Task A, so B only runs after A succeeds.