>>> type(cleaned)

Databricks provides an interactive debugger for Python notebooks that lets you set breakpoints, step through code line by line, inspect variables, and evaluate expressions — directly within the notebook UI. The debugger integrates with the standard Python pdb interface and the Databricks variable explorer, making it possible to diagnose logic errors, data issues, and exceptions without scattering print() statements through your code.

Set breakpoints and step through Python code in Databricks notebooks
Inspect variables, evaluate expressions, and examine stack traces
Use debugging techniques appropriate for distributed Spark workloads

Who this is for: Data engineers and data scientists who need to troubleshoot Python notebook code and want a more structured approach than print-based debugging.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: >>> type(cleaned)

The interactive debugger runs in the notebook's Python process on the cluster driver. When a breakpoint is hit, execution pauses and the notebook UI presents a debugging panel with the call stack, local variables, and an expression evaluator. Because Spark operations are distributed, the debugger sees driver-side code (transformations, function definitions) but cannot step into executor-side operations. For executor debugging, use logging and Spark UI.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Code[Python Code]:::source --> BP[Breakpoint]:::governance BP --> Pause[Execution Paused]:::processing Pause --> Inspect[Variable Inspector]:::serving Pause --> Step[Step Through]:::processing Step --> Resume[Continue Execution]:::source

*The debugger pauses at breakpoints, allowing inspection and stepping before resuming.*

The debugging scope is limited to driver-side Python code. Distributed Spark operations require separate debugging approaches.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Debuggable Driver[Driver Code]:::processing Functions[Python Functions]:::processing Logic[Business Logic]:::processing end subgraph Use Spark UI Executors[Executor Tasks]:::source Shuffle[Shuffle Operations]:::source end

*The interactive debugger works on driver-side code; use Spark UI for executor-level debugging.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Print[Print Debugging]:::source --> VarExplorer[Variable Explorer]:::processing VarExplorer --> Interactive[Interactive Debugger]:::serving Interactive --> SparkUI[Spark UI Analysis]:::governance

*Progress from print debugging to the variable explorer to the interactive debugger to Spark UI for complex issues.*

Key Terms

Interactive Debugger: A built-in notebook feature that pauses execution at breakpoints and allows line-by-line stepping.
Breakpoint: A marker on a line of code where the debugger pauses execution.
Step Over: Execute the current line and move to the next without entering function calls.
Step Into: Execute the current line and enter any function call on that line.
Step Out: Continue execution until the current function returns.
Variable Explorer: A panel showing the values of all variables in the current scope.

Prerequisites and Setup

A Python notebook attached to a Databricks cluster
Databricks Runtime 12.2 LTS or later (for the enhanced debugger)
Dedicated (single-user) cluster access mode for full debugging features
Familiarity with basic debugging concepts (breakpoints, stepping, stack traces)

Step-by-Step Implementation

Configuration Reference

>>> type(cleaned) configuration options
Debug Action	Keyboard Shortcut	Description
Toggle debugger	Shift + Ctrl + D	Open/close debugger panel
Set breakpoint	Click line gutter	Add/remove breakpoint on a line
Continue	F5	Run until next breakpoint
Step over	F10	Execute line, skip function internals
Step into	F11	Enter function call
Step out	Shift + F11	Return from current function
Stop debugging	Shift + F5	End debug session
Evaluate expression	Debug console	Run arbitrary code at breakpoint

Monitoring, Cost, and Security Considerations

Monitoring

The debugger pauses execution on the cluster, keeping the session and compute active. Monitor long debugging sessions to avoid unnecessary cluster costs. Use auto-termination as a safety net.

Cost Optimisation

- Debugging keeps the cluster running during pause periods; be mindful of long debug sessions.

- Use small sample datasets for debugging to reduce execution time and cost.

- Detach from the cluster when debugging is complete.

Security and Governance

- The debugger exposes variable values including potentially sensitive data; use it on development data only.

- On Standard clusters, the debugger may have reduced functionality due to Lakeguard restrictions.

- Debug sessions are not logged in audit trails; use separate debug notebooks for auditable testing.

Common Pitfalls and Recommended Patterns

Trying to debug Spark UDFs with the interactive debugger: UDFs run on executors, not the driver; use logging instead.
Debugging with full production datasets: use .limit(1000) or sample data for faster iteration.
Leaving breakpoints active in scheduled jobs: breakpoints hang job execution indefinitely; remove them before scheduling.
Not using the variable explorer: it is faster than adding print statements for simple inspections.
Ignoring Spark UI for executor issues: data skew, OOM errors, and shuffle failures are visible in Spark UI, not the debugger.
Overusing pdb over the visual debugger: the built-in visual debugger is more user-friendly for most cases.

Frequently Asked Questions

Does the debugger work with SQL cells?

No. The interactive debugger is Python-only. For SQL debugging, use EXPLAIN to inspect query plans and add validation queries to verify intermediate results.

Can I debug on serverless compute?

The interactive debugger requires a cluster with the debugger enabled. Check the latest Databricks documentation for serverless debugger support on your runtime version.

Will breakpoints affect other users on a shared cluster?

On Standard clusters, each user has an isolated session, so your breakpoints only pause your own execution. Other users are unaffected.

How do I debug a Spark DataFrame transformation?

Use .display(), .show(), or .printSchema() at intermediate steps rather than trying to step through distributed operations. The debugger works on the driver-side DataFrame API calls, not the distributed execution.