Supported Languages: Python, SQL, Scala, and R

Databricks notebooks support four languages — Python, SQL, Scala, and R — with the ability to mix them freely within a single notebook using magic commands. Each language has full access to the Spark session and Unity Catalog, and data can be shared between languages through temporary views and the Spark catalog. Choose your language based on your workload: Python for general-purpose engineering and ML, SQL for analytics, Scala for performance-critical code, and R for statistical modelling.

  • Understand the capabilities and trade-offs of each supported language
  • Learn how to mix languages within a single notebook using magic commands
  • Share data between cells of different languages

Who this is for: Developers, analysts, and data scientists who want to understand language options and interoperability in Databricks notebooks.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: Supported Languages: Python, SQL, Scala, and R

Every Databricks notebook has a default language set at creation time. Individual cells can override this language using magic commands (%python, %sql, %scala, %r). All languages share the same Spark session, which means they can access the same catalog, schemas, and temporary views. Data passes between languages through the Spark catalog — register a DataFrame as a temporary view in Python, then query it in SQL.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED NB[Notebook]:::serving --> Py[Python Cell]:::processing NB --> SQL[SQL Cell]:::processing NB --> Sc[Scala Cell]:::processing NB --> R[R Cell]:::processing Py --> Session[Shared Spark Session]:::governance SQL --> Session Sc --> Session R --> Session Session --> DL[(Delta Lake)]:::storage

*All four languages share the same Spark session, enabling data access and interoperability through the catalog.*

Data sharing between languages uses temporary views as the interchange format.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED PyDF[Python DataFrame]:::processing --> TempView[Temp View]:::governance TempView --> SQLQuery[SQL Query]:::serving SQLQuery --> Result[Results]:::source

*Register a Python DataFrame as a temporary view, then query it from SQL or any other language.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Python[Python: ML, ETL, General]:::processing SQL[SQL: Analytics, Queries]:::serving Scala[Scala: Performance, JVM]:::processing R[R: Statistics, Modelling]:::ingestion

*Each language has a primary strength: Python for general purpose, SQL for analytics, Scala for JVM performance, R for statistics.*

Key Terms

Magic Command
A cell prefix (%python, %sql, %scala, %r, %md) that sets the language for that cell.
PySpark
The Python API for Apache Spark, providing DataFrame and SQL operations.
Spark SQL
Spark's SQL interface for querying structured data using standard SQL syntax.
Temporary View
A session-scoped virtual table that makes a DataFrame queryable via SQL from any language.
SparkR
The R API for Apache Spark, enabling distributed data processing from R.

Prerequisites and Setup

  • A Databricks notebook attached to compute
  • Understanding of at least one supported language
  • Unity Catalog enabled for data access
  • For Scala: awareness of JVM and Spark internals is helpful
  • For R: familiarity with tidyverse and base R

Step-by-Step Implementation

    Configuration Reference

    Supported Languages: Python, SQL, Scala, and R configuration options
    LanguageMagic CommandAPIBest For
    Python%pythonPySpark, pandas, scikit-learnGeneral ETL, ML, analysis
    SQL%sqlSpark SQLQueries, analytics, dashboards
    Scala%scalaSpark Scala APIPerformance-critical, JVM integration
    R%rSparkR, tidyverseStatistical modelling, visualisation
    Markdown%mdMarkdown syntaxDocumentation, notes

    Monitoring, Cost, and Security Considerations

    Monitoring

    Each cell's execution time is shown regardless of language. Use the Spark UI to inspect query plans and stage execution for all languages. Slow cells in any language may indicate data skew or inefficient transformations.

    Cost Optimisation

    - Use SQL for simple queries and aggregations — it is often more optimised than equivalent Python code.

    - Avoid collecting large DataFrames to Python or R local memory; use Spark's distributed processing.

    - Prefer built-in Spark functions over Python UDFs for better performance and Photon compatibility.

    Security and Governance

    - Unity Catalog enforces the same access policies regardless of which language runs the query.

    - On Standard (shared) clusters, some Scala features are restricted to prevent bypassing Lakeguard isolation.

    - R and Python run in isolated processes on Standard clusters.

    Common Pitfalls and Recommended Patterns

    • Collecting large datasets to local memory: use .limit() or aggregation before .collect() or toPandas().
    • Using Python UDFs when built-in functions exist: UDFs prevent Photon acceleration and are slower.
    • Mixing too many languages in one notebook: stick to 1-2 languages for readability; use temp views for handoffs.
    • Forgetting that variables do not share across languages: Python variables are not visible in Scala cells.
    • Not using temporary views for cross-language data sharing: this is the only supported interchange mechanism.
    • Writing complex logic in SQL when Python is more maintainable: use the right tool for the task complexity.

    Frequently Asked Questions

    Can I share variables between Python and SQL?

    Not directly. Use temporary views (createOrReplaceTempView) to share DataFrames. You can also use spark.sql() in Python to execute SQL and return results as a DataFrame.

    Which language is fastest?

    For SQL and DataFrame operations, all languages compile to the same Spark execution plan, so performance is equivalent. Scala avoids Python-to-JVM serialisation overhead for some operations. Python UDFs are slower than built-in functions.

    Can I install additional Python packages?

    Yes. Use %pip install package_name in a notebook cell. The package is installed for the duration of the cluster session.

    Does R have full Spark support?

    SparkR provides DataFrame operations and SQL access. For advanced Spark features, use PySpark or Scala and share results via temporary views.