Azure Writer

Use the Azure Blob Writer task to write datasets from your data job (Spark) into an Azure Blob Storage container in analytics‑ready formats (CSV, JSON, Parquet, Avro).

Prerequisites

Access to the Azure Storage account and target container with write permissions.
One of the supported authentication methods:
- Shared Access Signature (SAS)
- Account Secret Key
- Azure AD Service Principal (Client ID, Tenant ID, Client Secret)
Network egress from the job’s compute to Azure Storage endpoints.
Agreed output file format, save mode, and (if required) an explicit Spark schema (JSON).

Tip: Validate with a small test write (tiny dataset or sample partition) before first production run.

Quick Start

Drag the Azure Blob Writer task to the workspace and open it (the Meta Information tab opens by default).
Under Write using, choose an auth method: SAS, Secret Key, or Principal Secret.
Provide Account Name, Container, and Blob Name (target path/prefix).
Select File Format (CSV/JSON/PARQUET/AVRO) and Save Mode (Append/Overwrite).
(Optional) Upload Schema File Name (Spark schema JSON).
Save Task In Storage and execute a small test write.

Meta Information — Field Reference

Common Output Settings (apply to all auth modes)

Field

Required

Example

Description / Best Practices

Account Name

mystorageacct

Azure Storage account name.

Container

curated

Destination container. A container is a logical unit for organizing blobs.

Blob Name

sales/2025/09/ or exports/run_20250912/

Target blob path or prefix within the container. For distributed writes, prefer a prefix (directory‑like path); the writer will create multiple part files under this prefix. Avoid forcing a single file for large writes.

File Format

PARQUET

Output format: CSV, JSON, PARQUET, AVRO. Prefer Parquet for analytics.

Save Mode

Append / Overwrite

Write behavior (see Save Mode Semantics).

Schema File Name

schema_orders.json

Optional Spark schema (JSON) to enforce types—recommended for CSV/JSON to avoid type drift.

Note: If you truly must generate a single file, use upstream coalesce(1) For small datasets only; it can bottleneck large writes.

Write using Azure Shared Access Signature:

Authentication & Location

Shared Access Signature (SAS): A Uniform Resource Identifier (URI) that provides restricted, time-limited access to specific Azure Storage resources without sharing account keys. This is the primary authentication method.
Account Name: The name of your Azure Storage account. This identifies the account where the data will be written.
Container: The name of the container within the specified Azure Storage account. A container is a top-level organizational unit, analogous to a directory, used to group and manage blobs.
Blob Name: The full name of the blob where the data will be stored. A blob is an object used for storing large amounts of unstructured data, such as images, videos, or text files.

Data & Schema Configuration

File Format: Specify the format in which the data will be written. The following formats are supported:
- CSV: Comma-Separated Values.
- JSON: JavaScript Object Notation.
- PARQUET: A columnar storage format optimized for big data analytics.
- AVRO: A row-based format suitable for data serialization.
Save Mode: Determines how new data interacts with existing data. Select one of the following options:
- Append: Adds new data to the existing file or files.
- Overwrite: Replaces the entire contents of the existing file or files with the new data.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data to be written, including data types and column names.

Write using Secret Key Option:

Authentication & Location

Account Key: This is a security credential that grants full access to your Azure Storage account. Treat it like a password; anyone with the key can manage your storage resources, including blobs, files, queues, or tables.
Account Name: The name of your Azure Storage account, which identifies the location where your data will be stored.
Container: A logical unit of storage within your Azure Storage account. Think of a container as a directory or folder used to organize and manage your blobs.
Blob Name: The specific name of the blob where your data will be written. A blob is an object designed for storing large amounts of unstructured data, such as images, videos, or text files.

Data & Schema Configuration

File Type: Select the format in which the data will be written. The available options are:
- CSV: Comma-Separated Values.
- JSON: JavaScript Object Notation.
- PARQUET: A columnar storage format optimized for big data analytics.
- AVRO: A row-based format suitable for data serialization.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data, including column names and data types, to be written.
Save Mode: Determines how the new data interacts with existing data in the destination. Choose one of the following options:
- Append: Adds the new data to the end of the existing file or files.
- Overwrite: Replaces the entire contents of the destination with the new data.

Write using Principal Secret

Authentication

Client ID: This is the unique Application (client) ID assigned to your application by Azure Active Directory (Azure AD) during the registration process. It serves as the application's unique identifier.
Tenant ID: Also known as the Directory ID, this is a unique identifier for your Azure AD tenant. It represents your organization or developer account and is used to identify the entity with which the application is associated.
Client Secret: This is a secure credential, similar to a password, used by your application to authenticate itself to Azure AD. It is crucial for establishing a secure connection and should be managed with care.

Location & Data Settings

Account Name: The name of your Azure Storage account, which identifies the location where the data will be written.
Container: A logical unit of storage in Azure Blob Storage that can hold blobs. It's similar to a directory or folder in a file system and is used to organize and manage blobs.
Blob Name: The specific name of the blob where the data will be stored. A blob is an object used for storing unstructured data, such as images, videos, or text files.

File & Schema Configuration

File Type: Select the format in which the data will be written. The following options are available:
- CSV: Comma-Separated Values.
- JSON: JavaScript Object Notation.
- PARQUET: A columnar storage format optimized for big data analytics.
- AVRO: A row-based format suitable for data serialization.
Save Mode: Determines how the new data interacts with any existing data at the destination. Choose from the following:
- Append: Adds the new data to the end of the existing file(s).
- Overwrite: Replaces the entire contents of the destination with the new data.
Schema File Name: Upload a Spark schema file in JSON format. This file defines the structure of the data to be written, including column names and their respective data types.

Schema Handling (Optional)

Provide a Spark schema JSON when writing CSV/JSON, or when strict typing is required.
Example (snippet):

{
  "type": "struct",
  "fields": [
    {"name":"order_id","type":"string","nullable":false},
    {"name":"customer_id","type":"string","nullable":true},
    {"name":"order_ts","type":"timestamp","nullable":true},
    {"name":"total_amount","type":"decimal(18,2)","nullable":true}
  ]
}

Ensure decimal precision/scale, and timestamp timezone conventions match downstream consumers.

PreviousHDFS Writer NextDB Writer