Key Personas: Who Uses Databricks and Why

Databricks is used by data engineers, data analysts and analytics engineers, data scientists and ML engineers, and platform/governance administrators, all collaborating on one governed copy of data. Each persona uses different surfaces, notebooks, SQL Warehouses, ML tooling, or admin consoles, but shares the same lakehouse and Unity Catalog. After reading, you will know what each role does on the platform, which features they rely on, and how their work connects.

  • Identify the main Databricks personas and their core responsibilities
  • Map each persona to the platform surfaces and features they use
  • Understand how the personas hand off work across one shared lakehouse

Who this is for: Team leads, hiring managers, and new users orienting to roles on the platform.

Part of the What is Databricks section in the Databricks tutorial series.

Architecture / Concept Overview: Key Personas: Who Uses Databricks and Why

Personas collaborate along the data lifecycle on shared, governed data. Data engineers build and maintain pipelines that produce trustworthy tables; analysts and analytics engineers model and query those tables for insight; data scientists and ML engineers build models on them; and administrators govern access, cost, and security across everyone. Unity Catalog is the common thread that lets them share data safely.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED DE[Data Engineer]:::ingestion --> Tables[(Governed Tables)]:::storage AE[Analyst and Analytics Engineer]:::processing --> Tables DS[Data Scientist and ML Engineer]:::serving --> Tables Admin[Platform and Governance Admin]:::governance -.governs.-> Tables

*Each persona works against the same governed tables, with administrators applying governance across all of them.*

Work flows from one persona to the next as data is refined and consumed.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED DE[Engineer Builds Pipelines]:::ingestion --> AE[Analyst Models and Reports]:::processing AE --> DS[Scientist Builds Models]:::serving DS --> App[Apps and Decisions]:::serving

*A typical handoff: engineers deliver curated tables, analysts model and report, and scientists build models that power applications.*

Key Terms

Data engineer
Builds and operates ingestion and transformation pipelines that produce reliable, governed tables.
Data analyst
Explores data and builds dashboards and reports, primarily using SQL and BI tools on curated tables.
Analytics engineer
Bridges engineering and analytics by building tested, reusable data models (often the Silver/Gold layers) with SQL.
Data scientist
Explores data and builds statistical and machine learning models to answer predictive questions.
ML engineer
Productionizes models: training pipelines, deployment, serving, and monitoring, often using Mosaic AI and MLflow.
Platform/governance administrator
Manages workspaces, identity, Unity Catalog permissions, cost policies, and security.

Prerequisites and Setup

  • A Databricks workspace with role-appropriate access
  • Unity Catalog groups that map to your personas
  • Compute suited to each role (clusters for engineering/ML, SQL Warehouses for analytics)
  • Clear ownership of catalogs and schemas per team

Step-by-Step Implementation

  1. Map personas to identity groups

    Create groups that mirror your personas so permissions are assigned by role, not individual.

    -- SQL cell - grant catalog usage to persona groups\nGRANT USE CATALOG ON CATALOG main TO `data-engineers`;\nGRANT USE CATALOG ON CATALOG main TO `analysts`;
  2. Give engineers pipeline access

    Engineers need write access to build Bronze/Silver/Gold tables.

    -- SQL cell - engineers can create and write in the engineering schema\nGRANT CREATE, MODIFY ON SCHEMA main.pipelines TO `data-engineers`;
  3. Give analysts read access and a warehouse

    Analysts mostly read curated Gold tables through a SQL Warehouse.

    -- SQL cell - analysts read the curated layer only\nGRANT SELECT ON SCHEMA main.gold TO `analysts`;
  4. Enable data scientists with feature access

    Data scientists read curated data and write to a sandbox/feature schema for experiments.

    # Python cell - a scientist reads governed data for modeling\ndf = spark.table("main.gold.customer_features")\ntrain = df.sample(fraction=0.8, seed=42)
  5. Let admins enforce guardrails

    Administrators apply cluster policies and audit access so all personas operate within limits.

    -- SQL cell - admins review who can access sensitive data\nSHOW GRANTS ON TABLE main.gold.customer_features;

Configuration Reference

Key Personas: Who Uses Databricks and Why configuration options
PersonaPrimary surfaceKey featuresTypical access
Data engineerNotebooks, LakeflowPipelines, Delta, jobsWrite to pipeline schemas
Data analystSQL editor, dashboardsSQL Warehouses, BIRead curated Gold
Analytics engineerSQL, reposModels, tests, viewsBuild Silver/Gold
Data scientistNotebooksMLflow, Mosaic AIRead curated, write sandbox
ML engineerNotebooks, jobsModel serving, monitoringDeploy and serve models
AdminAdmin consoleUnity Catalog, policiesManage all access

Monitoring, Cost, and Security Considerations

Monitoring

Administrators monitor usage and access across personas through audit logs and system tables, while each persona watches their own job, query, or model runs. Centralized audit logs make it straightforward to see which role accessed sensitive data, supporting compliance reviews.

Cost Optimisation

Assign cost-appropriate compute per persona: serverless SQL Warehouses for analysts, job clusters for engineering pipelines, and right-sized clusters for ML. Cluster policies per group prevent any single persona from provisioning oversized, expensive compute.

Security and Governance

Grant access by group at the least privilege needed: write for engineers, read-curated for analysts, sandbox for scientists. Row filters and column masks let multiple personas share the same table while seeing only what they are permitted.

Common Pitfalls and Recommended Patterns

  • Granting individual permissions: assign access to persona groups for maintainability.
  • Giving analysts write access to raw tables: keep them on curated, read-only layers.
  • One shared cluster for everyone: isolate compute by persona to control cost and contention.
  • No sandbox for scientists: provide an experiment schema so production tables stay clean.
  • Skipping audit review: regularly review grants on sensitive tables.

Frequently Asked Questions

Can one person play multiple personas?

Yes, especially on smaller teams; an analytics engineer often spans analyst and engineering tasks. The platform supports this by granting the union of needed permissions to that user's groups.

Which persona owns data quality?

Data engineers and analytics engineers own producing reliable Silver/Gold tables, but governance administrators set the standards and analysts surface issues they find downstream.

Do analysts need to know Spark?

Usually not. Analysts primarily use SQL on SQL Warehouses; Spark knowledge is more relevant to engineers and data scientists working with large-scale transformations.

How do personas avoid stepping on each other?

Unity Catalog isolates access by catalog/schema and group, and separate compute prevents resource contention, so personas collaborate on shared data without interfering.