Key Personas: Who Uses Databricks and Why
Databricks is used by data engineers, data analysts and analytics engineers, data scientists and ML engineers, and platform/governance administrators, all collaborating on one governed copy of data. Each persona uses different surfaces, notebooks, SQL Warehouses, ML tooling, or admin consoles, but shares the same lakehouse and Unity Catalog. After reading, you will know what each role does on the platform, which features they rely on, and how their work connects.
- Identify the main Databricks personas and their core responsibilities
- Map each persona to the platform surfaces and features they use
- Understand how the personas hand off work across one shared lakehouse
Who this is for: Team leads, hiring managers, and new users orienting to roles on the platform.
Part of the What is Databricks section in the Databricks tutorial series.
Architecture / Concept Overview: Key Personas: Who Uses Databricks and Why
Personas collaborate along the data lifecycle on shared, governed data. Data engineers build and maintain pipelines that produce trustworthy tables; analysts and analytics engineers model and query those tables for insight; data scientists and ML engineers build models on them; and administrators govern access, cost, and security across everyone. Unity Catalog is the common thread that lets them share data safely.
*Each persona works against the same governed tables, with administrators applying governance across all of them.*
Work flows from one persona to the next as data is refined and consumed.
*A typical handoff: engineers deliver curated tables, analysts model and report, and scientists build models that power applications.*
Key Terms
- Data engineer
- Builds and operates ingestion and transformation pipelines that produce reliable, governed tables.
- Data analyst
- Explores data and builds dashboards and reports, primarily using SQL and BI tools on curated tables.
- Analytics engineer
- Bridges engineering and analytics by building tested, reusable data models (often the Silver/Gold layers) with SQL.
- Data scientist
- Explores data and builds statistical and machine learning models to answer predictive questions.
- ML engineer
- Productionizes models: training pipelines, deployment, serving, and monitoring, often using Mosaic AI and MLflow.
- Platform/governance administrator
- Manages workspaces, identity, Unity Catalog permissions, cost policies, and security.
Prerequisites and Setup
- A Databricks workspace with role-appropriate access
- Unity Catalog groups that map to your personas
- Compute suited to each role (clusters for engineering/ML, SQL Warehouses for analytics)
- Clear ownership of catalogs and schemas per team
Step-by-Step Implementation
Map personas to identity groups
Create groups that mirror your personas so permissions are assigned by role, not individual.
-- SQL cell - grant catalog usage to persona groups\nGRANT USE CATALOG ON CATALOG main TO `data-engineers`;\nGRANT USE CATALOG ON CATALOG main TO `analysts`;Give engineers pipeline access
Engineers need write access to build Bronze/Silver/Gold tables.
-- SQL cell - engineers can create and write in the engineering schema\nGRANT CREATE, MODIFY ON SCHEMA main.pipelines TO `data-engineers`;Give analysts read access and a warehouse
Analysts mostly read curated Gold tables through a SQL Warehouse.
-- SQL cell - analysts read the curated layer only\nGRANT SELECT ON SCHEMA main.gold TO `analysts`;Enable data scientists with feature access
Data scientists read curated data and write to a sandbox/feature schema for experiments.
# Python cell - a scientist reads governed data for modeling\ndf = spark.table("main.gold.customer_features")\ntrain = df.sample(fraction=0.8, seed=42)Let admins enforce guardrails
Administrators apply cluster policies and audit access so all personas operate within limits.
-- SQL cell - admins review who can access sensitive data\nSHOW GRANTS ON TABLE main.gold.customer_features;
Configuration Reference
| Persona | Primary surface | Key features | Typical access |
|---|---|---|---|
| Data engineer | Notebooks, Lakeflow | Pipelines, Delta, jobs | Write to pipeline schemas |
| Data analyst | SQL editor, dashboards | SQL Warehouses, BI | Read curated Gold |
| Analytics engineer | SQL, repos | Models, tests, views | Build Silver/Gold |
| Data scientist | Notebooks | MLflow, Mosaic AI | Read curated, write sandbox |
| ML engineer | Notebooks, jobs | Model serving, monitoring | Deploy and serve models |
| Admin | Admin console | Unity Catalog, policies | Manage all access |
Monitoring, Cost, and Security Considerations
Monitoring
Administrators monitor usage and access across personas through audit logs and system tables, while each persona watches their own job, query, or model runs. Centralized audit logs make it straightforward to see which role accessed sensitive data, supporting compliance reviews.
Cost Optimisation
Assign cost-appropriate compute per persona: serverless SQL Warehouses for analysts, job clusters for engineering pipelines, and right-sized clusters for ML. Cluster policies per group prevent any single persona from provisioning oversized, expensive compute.
Security and Governance
Grant access by group at the least privilege needed: write for engineers, read-curated for analysts, sandbox for scientists. Row filters and column masks let multiple personas share the same table while seeing only what they are permitted.
Common Pitfalls and Recommended Patterns
- Granting individual permissions: assign access to persona groups for maintainability.
- Giving analysts write access to raw tables: keep them on curated, read-only layers.
- One shared cluster for everyone: isolate compute by persona to control cost and contention.
- No sandbox for scientists: provide an experiment schema so production tables stay clean.
- Skipping audit review: regularly review grants on sensitive tables.
Frequently Asked Questions
Can one person play multiple personas?
Yes, especially on smaller teams; an analytics engineer often spans analyst and engineering tasks. The platform supports this by granting the union of needed permissions to that user's groups.
Which persona owns data quality?
Data engineers and analytics engineers own producing reliable Silver/Gold tables, but governance administrators set the standards and analysts surface issues they find downstream.
Do analysts need to know Spark?
Usually not. Analysts primarily use SQL on SQL Warehouses; Spark knowledge is more relevant to engineers and data scientists working with large-scale transformations.
How do personas avoid stepping on each other?
Unity Catalog isolates access by catalog/schema and group, and separate compute prevents resource contention, so personas collaborate on shared data without interfering.