3. Use Spark's built-in functions instead of UDFs

The Data Science Agent is an AI-powered assistant built into Databricks notebooks that generates code, explains errors, writes documentation, suggests optimisations, and answers questions about your data — all within the notebook context. It understands your workspace's tables, schemas, and Unity Catalog metadata, making it a context-aware coding partner that accelerates development across Python, SQL, Scala, and R.

  • Understand how the Data Science Agent integrates with the notebook environment
  • Use the assistant to generate, explain, fix, and optimise code
  • Leverage AI-assisted features for faster data exploration and pipeline development

Who this is for: Data engineers, analysts, and data scientists who want to accelerate notebook development with AI assistance.

Part of the Databricks Notebooks section of the Databricks tutorial series.

Architecture / Concept Overview: 3. Use Spark's built-in functions instead of UDFs

The Data Science Agent runs as an integrated AI service within the notebook UI. It has access to your notebook's execution context — including the SparkSession, Unity Catalog metadata, cell history, and error outputs. When you ask a question or request code, the agent uses this context to generate relevant, accurate responses grounded in your actual data environment.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED User[User Prompt]:::source --> Agent[Data Science Agent]:::serving Agent --> UC[Unity Catalog Context]:::governance Agent --> NB[Notebook History]:::processing Agent --> Errors[Error Context]:::processing Agent --> Code[Generated Code]:::serving Code --> Cell[Notebook Cell]:::serving

*The agent uses Unity Catalog, notebook history, and error context to generate relevant code.*

The agent supports multiple interaction patterns across the development workflow.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Generate[Generate Code]:::serving Explain[Explain Code]:::processing Fix[Fix Errors]:::ingestion Optimise[Optimise Queries]:::serving Document[Write Docs]:::source Explore[Explore Data]:::processing

*Six primary interaction modes: generate, explain, fix, optimise, document, and explore.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Question[Ask Question]:::source --> Agent[Agent Processes]:::processing Agent --> Accept[Accept Suggestion]:::serving Agent --> Edit[Edit and Iterate]:::processing Accept --> Run[Run Cell]:::serving Edit --> Agent

*The workflow is iterative: ask, review the suggestion, accept or edit, and run.*

Key Terms

Data Science Agent
The built-in AI assistant in Databricks notebooks that generates and explains code in context.
Context-Aware
The agent accesses your Unity Catalog schemas, notebook cells, and error messages to provide relevant suggestions.
Code Generation
The agent writes Python, SQL, Scala, or R code based on natural language prompts.
Error Explanation
The agent analyses error messages and stack traces to suggest fixes.
Autocomplete
Inline code completion powered by the AI model as you type.

Prerequisites and Setup

  • A Databricks workspace with the AI assistant feature enabled
  • A notebook attached to compute (the agent needs the Spark context for schema awareness)
  • Unity Catalog enabled for rich metadata context
  • Feature flag enabled by workspace admin (if not on by default)

Step-by-Step Implementation

    Configuration Reference

    3. Use Spark's built-in functions instead of UDFs configuration options
    FeatureAccess MethodContext Used
    Chat panelCtrl + Shift + SpaceFull notebook + UC metadata
    Inline autocompleteType in cellCurrent cell + imports
    Fix errorClick "Fix with AI" on errorError message + cell code
    Explain codeSelect code → "Explain"Selected code + notebook context
    Generate codeType prompt in chatPrompt + UC schemas + notebook
    OptimiseAsk in chatCurrent code + table metadata

    Monitoring, Cost, and Security Considerations

    Monitoring

    The AI assistant logs queries and responses for workspace administrators. Monitor usage to understand adoption and identify training opportunities.

    Cost Optimisation

    - The agent can suggest more efficient queries, reducing compute costs by optimising transformations.

    - AI-generated code should be reviewed for correctness; incorrect code that runs without errors can produce wrong results, leading to wasted re-processing.

    - Use the agent to quickly prototype, then refine — faster development means less compute time spent on trial and error.

    Security and Governance

    - The agent accesses Unity Catalog metadata (schema names, column names) but does not read actual data values.

    - AI-generated code runs with the user's permissions, so Unity Catalog still enforces data access control.

    - Sensitive schema information visible to the agent is subject to workspace access policies.

    - Do not paste credentials or secrets into the chat; use dbutils.secrets.get() references instead.

    Common Pitfalls and Recommended Patterns

    • Blindly accepting generated code: always review and understand AI-generated code before running it.
    • Not providing enough context in prompts: include table names, column types, and expected output format for better results.
    • Using the agent for complex business logic: it excels at common patterns; review carefully for domain-specific edge cases.
    • Ignoring the agent's optimisation suggestions: AI-suggested improvements often catch genuine performance issues.
    • Asking overly broad questions: be specific — "read X table and join with Y on Z" works better than "analyse my data".
    • Treating the agent as infallible: it can generate incorrect SQL or Python; validate results with known data.

    Frequently Asked Questions

    Does the AI assistant see my data?

    The assistant accesses Unity Catalog metadata (table names, column names, data types) but does not read actual data values in your tables.

    Can I use the assistant with SQL notebooks?

    Yes. The assistant generates SQL as well as Python. In SQL-default notebooks, it provides SQL-specific suggestions and error fixes.

    Is the assistant available on all compute types?

    The assistant runs in the notebook UI and works with any attached compute. Some features may require a minimum Databricks Runtime version.

    Can I disable the assistant for my workspace?

    Yes. Workspace administrators can enable or disable the AI assistant feature at the workspace level.

    How accurate is the generated code?

    The assistant produces correct code for common patterns with high reliability. For complex domain-specific logic, edge cases, or advanced Spark features, always validate the output against expected results.