Importing and Exporting Notebooks
Databricks supports importing and exporting notebooks in multiple formats — DBC archives, Jupyter (.ipynb), Python (.py), SQL, Scala, and R source files — so you can migrate work between workspaces, share with colleagues outside Databricks, or integrate with CI/CD pipelines. Use the workspace UI, CLI, or REST API for bulk operations, and Repos for ongoing Git-based synchronisation.
- Import notebooks from Jupyter, source files, or DBC archives into Databricks
- Export notebooks in the format best suited for your use case
- Automate import/export with the CLI and REST API
Who this is for: Engineers and analysts who need to move notebooks between environments, share outside Databricks, or integrate with version control systems.
Part of the Databricks Notebooks section of the Databricks tutorial series.
Architecture / Concept Overview: Importing and Exporting Notebooks
Notebooks in Databricks are stored in the workspace file system managed by the control plane. Import converts external files into workspace notebook objects; export converts workspace notebooks back into portable file formats. The DBC format is Databricks-proprietary and preserves all notebook metadata, while source formats (.py, .sql) contain only code.
*Import converts external files into workspace notebooks; export converts them back into portable formats.*
*Each format preserves different levels of notebook content, from full metadata (DBC) to code only (source).*
*Four methods for import/export: UI for one-off, CLI for scripting, API for automation, Repos for continuous sync.*
Key Terms
- DBC Archive
- A Databricks-proprietary archive format that bundles notebooks with metadata, dashboards, and folder structure.
- Jupyter Notebook (
.ipynb) - The open JSON format used by Jupyter, containing cells with code and outputs.
- Source Format
- Plain text files (
.py,.sql,.scala,.r) containing notebook code with cell separators. - Workspace Import
- The process of uploading external files into the Databricks workspace as notebook objects.
- Repos
- Git integration for continuous synchronisation between workspace notebooks and external repositories.
Prerequisites and Setup
- A Databricks workspace with import/export permissions
- The Databricks CLI installed and configured for scripted operations
- Source files or archives to import
- A Git repository for Repos-based synchronisation
Step-by-Step Implementation
Configuration Reference
| Format | Extension | Preserves Outputs | Preserves Metadata | Use Case |
|---|---|---|---|---|
| DBC | .dbc | Yes | Yes (full) | Backup, migration between workspaces |
| Jupyter | .ipynb | Yes | Partial | Sharing with Jupyter users |
| Source | .py, .sql, .scala, .r | No | No | Git, CI/CD, code review |
| HTML | .html | Yes | No | Read-only sharing |
| R Markdown | .Rmd | No | No | R workflows |
Monitoring, Cost, and Security Considerations
Monitoring
Track workspace import/export operations through the audit log. Large DBC archive imports can take time; monitor for completion and errors. Repos sync status is visible in the workspace UI.
Cost Optimisation
- Import/export operations do not consume compute resources or DBUs.
- Use Repos for ongoing synchronisation instead of repeated manual imports to save time and reduce errors.
- Clean up unused imported notebooks to reduce workspace clutter.
Security and Governance
- Exported notebooks may contain sensitive outputs, credentials, or data; treat exports as confidential.
- DBC archives include all cell outputs, which could contain PII or query results.
- Source format exports include only code, which is safer for version control.
- Use workspace permissions to control who can import or export notebooks.
Common Pitfalls and Recommended Patterns
- Exporting DBC archives with sensitive outputs: use source format for version control to avoid leaking data.
- Importing Jupyter notebooks with incompatible libraries: verify dependencies are available on the cluster.
- Not using
--overwritewhen re-importing: without it, the CLI fails if the notebook already exists. - Relying on manual import/export instead of Repos: use Git integration for ongoing development.
- Importing large DBC archives into the wrong folder: always verify the target path before importing.
- Forgetting that source format loses outputs and metadata: re-run cells after importing source files.
Frequently Asked Questions
Can I import a folder of notebooks at once?
Yes. Use databricks workspace import_dir to import an entire directory of source files, or import a DBC archive that contains multiple notebooks.
Does importing overwrite existing notebooks?
Only if you use the --overwrite flag. Without it, the import fails if a notebook already exists at the target path.
Can I export notebooks with their outputs?
Yes. Use DBC, Jupyter, or HTML formats to preserve cell outputs. Source format exports only the code.
Should I use DBC or source format for backups?
Use DBC for full backups (preserves outputs and metadata). Use source format for Git-based version control (cleaner diffs, no binary data).