>>> type(cleaned)
Databricks provides an interactive debugger for Python notebooks that lets you set breakpoints, step through code line by line, inspect variables, and evaluate expressions — directly within the notebook UI. The debugger integrates with the standard Python pdb interface and the Databricks variable explorer, making it possible to diagnose logic errors, data issues, and exceptions without scattering print() statements through your code.
- Set breakpoints and step through Python code in Databricks notebooks
- Inspect variables, evaluate expressions, and examine stack traces
- Use debugging techniques appropriate for distributed Spark workloads
Who this is for: Data engineers and data scientists who need to troubleshoot Python notebook code and want a more structured approach than print-based debugging.
Part of the Databricks Notebooks section of the Databricks tutorial series.
Architecture / Concept Overview: >>> type(cleaned)
The interactive debugger runs in the notebook's Python process on the cluster driver. When a breakpoint is hit, execution pauses and the notebook UI presents a debugging panel with the call stack, local variables, and an expression evaluator. Because Spark operations are distributed, the debugger sees driver-side code (transformations, function definitions) but cannot step into executor-side operations. For executor debugging, use logging and Spark UI.
*The debugger pauses at breakpoints, allowing inspection and stepping before resuming.*
The debugging scope is limited to driver-side Python code. Distributed Spark operations require separate debugging approaches.
*The interactive debugger works on driver-side code; use Spark UI for executor-level debugging.*
*Progress from print debugging to the variable explorer to the interactive debugger to Spark UI for complex issues.*
Key Terms
- Interactive Debugger
- A built-in notebook feature that pauses execution at breakpoints and allows line-by-line stepping.
- Breakpoint
- A marker on a line of code where the debugger pauses execution.
- Step Over
- Execute the current line and move to the next without entering function calls.
- Step Into
- Execute the current line and enter any function call on that line.
- Step Out
- Continue execution until the current function returns.
- Variable Explorer
- A panel showing the values of all variables in the current scope.
Prerequisites and Setup
- A Python notebook attached to a Databricks cluster
- Databricks Runtime 12.2 LTS or later (for the enhanced debugger)
- Dedicated (single-user) cluster access mode for full debugging features
- Familiarity with basic debugging concepts (breakpoints, stepping, stack traces)
Step-by-Step Implementation
Configuration Reference
| Debug Action | Keyboard Shortcut | Description |
|---|---|---|
| Toggle debugger | Shift + Ctrl + D | Open/close debugger panel |
| Set breakpoint | Click line gutter | Add/remove breakpoint on a line |
| Continue | F5 | Run until next breakpoint |
| Step over | F10 | Execute line, skip function internals |
| Step into | F11 | Enter function call |
| Step out | Shift + F11 | Return from current function |
| Stop debugging | Shift + F5 | End debug session |
| Evaluate expression | Debug console | Run arbitrary code at breakpoint |
Monitoring, Cost, and Security Considerations
Monitoring
The debugger pauses execution on the cluster, keeping the session and compute active. Monitor long debugging sessions to avoid unnecessary cluster costs. Use auto-termination as a safety net.
Cost Optimisation
- Debugging keeps the cluster running during pause periods; be mindful of long debug sessions.
- Use small sample datasets for debugging to reduce execution time and cost.
- Detach from the cluster when debugging is complete.
Security and Governance
- The debugger exposes variable values including potentially sensitive data; use it on development data only.
- On Standard clusters, the debugger may have reduced functionality due to Lakeguard restrictions.
- Debug sessions are not logged in audit trails; use separate debug notebooks for auditable testing.
Common Pitfalls and Recommended Patterns
- Trying to debug Spark UDFs with the interactive debugger: UDFs run on executors, not the driver; use logging instead.
- Debugging with full production datasets: use
.limit(1000)or sample data for faster iteration. - Leaving breakpoints active in scheduled jobs: breakpoints hang job execution indefinitely; remove them before scheduling.
- Not using the variable explorer: it is faster than adding print statements for simple inspections.
- Ignoring Spark UI for executor issues: data skew, OOM errors, and shuffle failures are visible in Spark UI, not the debugger.
- Overusing pdb over the visual debugger: the built-in visual debugger is more user-friendly for most cases.
Frequently Asked Questions
Does the debugger work with SQL cells?
No. The interactive debugger is Python-only. For SQL debugging, use EXPLAIN to inspect query plans and add validation queries to verify intermediate results.
Can I debug on serverless compute?
The interactive debugger requires a cluster with the debugger enabled. Check the latest Databricks documentation for serverless debugger support on your runtime version.
Will breakpoints affect other users on a shared cluster?
On Standard clusters, each user has an isolated session, so your breakpoints only pause your own execution. Other users are unaffected.
How do I debug a Spark DataFrame transformation?
Use .display(), .show(), or .printSchema() at intermediate steps rather than trying to step through distributed operations. The debugger works on the driver-side DataFrame API calls, not the distributed execution.