Azure Writer
The Azure Blob Writer component writes data into Azure Blob Storage. It supports multiple file formats, partitioning, and save modes. Authentication is handled using either Storage Account Key or Azure AD Service Principal (Client Secret) credentials.
Configuration Sections
The Azure Blob Writer configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Authentication Methods
1. Write Using Secret Key
Authenticate using the Storage Account Key.
Account Key
Storage account key used to authenticate.
xxxx12345...
Account Name
Name of the Azure storage account.
mystorageacct
Container
Name of the target container.
sales-data
Blob Name
Target blob name (path + file).
transactions/2025-01-01.csv
File Format
Output file type: CSV
, JSON
, PARQUET
, AVRO
.
PARQUET
Save Mode
Write behavior: Append
, Overwrite
.
Overwrite
Schema File Name
Spark schema file in JSON format.
schema.json
Column Filter
Define which columns to write. See Column Filtering.
N/A
Partition Column
Partition data by column(s).
date
, region
2. Write Using Principal Secret
Authenticate using Azure Active Directory Service Principal.
Client ID
Application (client) ID from Azure AD.
2c76b0a9-xxxx-xxxx-xxxx-abcdef
Tenant ID
Directory (tenant) ID from Azure AD.
72f988bf-xxxx-xxxx-xxxx-abcdef
Client Secret
Secret key of the registered application.
********
Account Name
Name of the Azure storage account.
mystorageacct
Container
Name of the target container.
finance-data
Blob Name
Target blob name.
monthly/summary.parquet
File Format
Output file type: CSV
, JSON
, PARQUET
, AVRO
.
JSON
Save Mode
Write behavior: Append
, Overwrite
.
Append
Schema File Name
Spark schema file in JSON format.
finance_schema.json
Column Filter
Define which columns to write. See Column Filtering.
N/A
Partition Column
Partition data by column(s).
year
, department
Save Modes
Append
Adds new data to the existing blob.
Overwrite
Replaces the blob contents with new data.
Column Filtering
The Column Filter section allows you to select and rename columns before writing to Azure Blob.
Name
Name of the column from upstream data.
customer_id
Alias
Alias name to use in the container.
cust_id
Column Type
Data type of the column.
STRING
Additional Options:
Upload: Upload CSV/JSON/Excel to auto-populate schema.
Download Data: Export schema mapping in JSON format.
Delete Data: Clear all column filter entries.
Partitioning
Partitioning organizes data in the container by column values, improving query performance and management.
Example: Partition by date
azure://mystorageacct/sales-data/date=2025-01-01/
azure://mystorageacct/sales-data/date=2025-01-02/
azure://mystorageacct/sales-data/date=2025-01-03/
Notes
Ensure the service principal or account has proper RBAC roles (e.g., Storage Blob Data Contributor) on the target storage account.
Prefer Parquet or Avro for production workloads due to better compression and schema support.
Use partitioning for large datasets to improve performance and data organization.
For secure deployments, store Account Keys and Client Secrets in Azure Key Vault.