File-arrival trigger
Who this is for:
Architecture / Concept Overview: File-arrival trigger
A multi-task job defines a DAG where each node is a task (notebook, pipeline, SQL, script) and edges represent dependencies. The scheduler executes tasks in topological order, running independent tasks in parallel and sequential tasks one after another.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
START[Job Trigger]:::source
START --> T1[Ingest Orders]:::ingestion
START --> T2[Ingest Products]:::ingestion
T1 --> T3[Join & Enrich]:::processing
T2 --> T3
T3 --> T4[Build Silver Tables]:::storage
T4 --> T5[Refresh Gold Aggregates]:::serving
T4 --> T6[Update ML Features]:::serving
T5 --> T7[Send Report]:::governance
T6 --> T7
*A real-world job DAG with parallel ingestion, sequential transformation, and fan-out to ML and reporting.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
COMPUTE[Compute Options]:::processing
COMPUTE --> JC[Job Cluster - Ephemeral]:::processing
COMPUTE --> SJC[Shared Job Cluster]:::processing
COMPUTE --> SL[Serverless]:::serving
COMPUTE --> EX[Existing Cluster]:::source
JC --> JCA[Per-task isolation]:::processing
SJC --> SJCA[Shared across tasks]:::processing
SL --> SLA[No cluster management]:::serving
EX --> EXA[Development only]:::source
*Compute options for multi-task jobs.*
Key Terms
Prerequisites and Setup
- Notebooks or scripts for each task stored in a Databricks Repo or workspace folder.
- Permissions to create jobs and clusters.
- A target catalog and schema for output tables.
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Default |
|---|---|---|
schedule.quartz_cron_expression | Cron expression for scheduled runs | None |
max_concurrent_runs | Max parallel executions of this job | 1 |
timeout_seconds | Task timeout in seconds | 0 (no limit) |
max_retries | Retry count on failure | 0 |
min_retry_interval_millis | Delay between retries | 0 |
retry_on_timeout | Retry if task times out | false |
run_if | Execution condition for dependent tasks | ALL_SUCCESS |
base_parameters | Default parameters passed to the notebook | Empty |