Azure Writer
Use the Azure Blob Writer task to write datasets from your data job (Spark) into an Azure Blob Storage container in analytics‑ready formats (CSV, JSON, Parquet, Avro).
Prerequisites
Access to the Azure Storage account and target container with write permissions.
One of the supported authentication methods:
Shared Access Signature (SAS)
Account Secret Key
Azure AD Service Principal (Client ID, Tenant ID, Client Secret)
Network egress from the job’s compute to Azure Storage endpoints.
Agreed output file format, save mode, and (if required) an explicit Spark schema (JSON).
Quick Start
Drag the Azure Blob Writer task to the workspace and open it (the Meta Information tab opens by default).
Under Write using, choose an auth method: SAS, Secret Key, or Principal Secret.
Provide Account Name, Container, and Blob Name (target path/prefix).
Select File Format (CSV/JSON/PARQUET/AVRO) and Save Mode (Append/Overwrite).
(Optional) Upload Schema File Name (Spark schema JSON).
Save Task In Storage and execute a small test write.
Meta Information — Field Reference
Common Output Settings (apply to all auth modes)
Account Name
*
mystorageacct
Azure Storage account name.
Container
*
curated
Destination container. A container is a logical unit for organizing blobs.
Blob Name
*
sales/2025/09/
or exports/run_20250912/
Target blob path or prefix within the container. For distributed writes, prefer a prefix (directory‑like path); the writer will create multiple part files under this prefix. Avoid forcing a single file for large writes.
File Format
*
PARQUET
Output format: CSV, JSON, PARQUET, AVRO. Prefer Parquet for analytics.
Save Mode
*
Append
/ Overwrite
Write behavior (see Save Mode Semantics).
Schema File Name
schema_orders.json
Optional Spark schema (JSON) to enforce types—recommended for CSV/JSON to avoid type drift.
Write using Azure Shared Access Signature:
Authentication & Location
Shared Access Signature (SAS): A Uniform Resource Identifier (URI) that provides restricted, time-limited access to specific Azure Storage resources without sharing account keys. This is the primary authentication method.
Account Name: The name of your Azure Storage account. This identifies the account where the data will be written.
Container: The name of the container within the specified Azure Storage account. A container is a top-level organizational unit, analogous to a directory, used to group and manage blobs.
Blob Name: The full name of the blob where the data will be stored. A blob is an object used for storing large amounts of unstructured data, such as images, videos, or text files.
Data & Schema Configuration
File Format: Specify the format in which the data will be written. The following formats are supported:
CSV: Comma-Separated Values.
JSON: JavaScript Object Notation.
PARQUET: A columnar storage format optimized for big data analytics.
AVRO: A row-based format suitable for data serialization.
Save Mode: Determines how new data interacts with existing data. Select one of the following options:
Append: Adds new data to the existing file or files.
Overwrite: Replaces the entire contents of the existing file or files with the new data.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data to be written, including data types and column names.
Write using Secret Key Option:
Authentication & Location
Account Key: This is a security credential that grants full access to your Azure Storage account. Treat it like a password; anyone with the key can manage your storage resources, including blobs, files, queues, or tables.
Account Name: The name of your Azure Storage account, which identifies the location where your data will be stored.
Container: A logical unit of storage within your Azure Storage account. Think of a container as a directory or folder used to organize and manage your blobs.
Blob Name: The specific name of the blob where your data will be written. A blob is an object designed for storing large amounts of unstructured data, such as images, videos, or text files.
Data & Schema Configuration
File Type: Select the format in which the data will be written. The available options are:
CSV: Comma-Separated Values.
JSON: JavaScript Object Notation.
PARQUET: A columnar storage format optimized for big data analytics.
AVRO: A row-based format suitable for data serialization.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data, including column names and data types, to be written.
Save Mode: Determines how the new data interacts with existing data in the destination. Choose one of the following options:
Append: Adds the new data to the end of the existing file or files.
Overwrite: Replaces the entire contents of the destination with the new data.
Write using Principal Secret
Authentication
Client ID: This is the unique Application (client) ID assigned to your application by Azure Active Directory (Azure AD) during the registration process. It serves as the application's unique identifier.
Tenant ID: Also known as the Directory ID, this is a unique identifier for your Azure AD tenant. It represents your organization or developer account and is used to identify the entity with which the application is associated.
Client Secret: This is a secure credential, similar to a password, used by your application to authenticate itself to Azure AD. It is crucial for establishing a secure connection and should be managed with care.
Location & Data Settings
Account Name: The name of your Azure Storage account, which identifies the location where the data will be written.
Container: A logical unit of storage in Azure Blob Storage that can hold blobs. It's similar to a directory or folder in a file system and is used to organize and manage blobs.
Blob Name: The specific name of the blob where the data will be stored. A blob is an object used for storing unstructured data, such as images, videos, or text files.
File & Schema Configuration
File Type: Select the format in which the data will be written. The following options are available:
CSV: Comma-Separated Values.
JSON: JavaScript Object Notation.
PARQUET: A columnar storage format optimized for big data analytics.
AVRO: A row-based format suitable for data serialization.
Save Mode: Determines how the new data interacts with any existing data at the destination. Choose from the following:
Append: Adds the new data to the end of the existing file(s).
Overwrite: Replaces the entire contents of the destination with the new data.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data to be written, including column names and their respective data types.
Schema Handling (Optional)
Provide a Spark schema JSON when writing CSV/JSON, or when strict typing is required.
Example (snippet):
{
"type": "struct",
"fields": [
{"name":"order_id","type":"string","nullable":false},
{"name":"customer_id","type":"string","nullable":true},
{"name":"order_ts","type":"timestamp","nullable":true},
{"name":"total_amount","type":"decimal(18,2)","nullable":true}
]
}
Ensure decimal precision/scale, and timestamp timezone conventions match downstream consumers.