Azure Blob Reader (Docker)

The Azure Blob Reader (Docker) component reads data stored in Azure Blob Storage. It runs as a Docker-based component and supports multiple authentication mechanisms for secure access. Files can be ingested in common formats including CSV, JSON, Parquet, Avro, and XML.

Configuration Sections

The Azure Blob Reader component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Authentication Methods

The component supports three authentication methods for connecting to Azure Blob Storage:

  1. Shared Access Signature (SAS)Recommended for temporary, revocable access

  2. Secret Key (Storage Account Key)Full account access; use with caution

  3. Principal Secret (Azure AD Service Principal)Enterprise-grade, app-based access

⚠️ Security Best Practices

  • Prefer SAS tokens for temporary and granular access.

  • Store Secret Keys and Principal Secrets securely in Azure Key Vault.

  • Avoid hardcoding credentials in pipelines.

1. Using Shared Access Signature (SAS)

Parameter
Description
Example

Shared Access Signature

SAS URI granting restricted access to storage resources.

?sv=2025-01-01&ss=b&srt=...

Account Name

Azure storage account name.

myazureaccount

Container

Name of the container.

sales-data

File Type

File type: CSV, JSON, PARQUET, AVRO, XML.

CSV

Read Directory

If enabled, reads all blobs in the container.

true (default)

Blob Name

Specific blob to read (if Read Directory is disabled).

transactions.csv

Column Filter

Filter columns with alias and data type.

See Column Filtering.

2. Using Secret Key

Parameter
Description
Example

Account Key

Storage account key for Shared Key authorization.

xxxx12345...

Account Name

Azure storage account name.

myazureaccount

Container

Name of the container.

finance-data

File Type

File type: CSV, JSON, PARQUET, AVRO.

PARQUET

Read Directory

If enabled, reads all blobs in the container.

true

Blob Name

Specific blob to read (if Read Directory is disabled).

q1_data.json

Column Filter

Filter columns with alias and type.

See Column Filtering.

3. Using Principal Secret

Parameter
Description
Example

Client ID

Application (client) ID from Azure AD.

2c76b0a9-xxxx-xxxx-xxxx-abcdef

Tenant ID

Directory (tenant) ID of your Azure AD instance.

72f988bf-xxxx-xxxx-xxxx-abcdef

Client Secret

Secret key of the service principal.

********

Account Name

Azure storage account name.

myazureaccount

File Type

File type: CSV, JSON, PARQUET, AVRO.

JSON

Read Directory

If enabled, reads all blobs in the container.

true

Blob Name

Specific blob to read (if Read Directory is disabled).

archive.zip

Column Filter

Filter columns with alias and type.

See Column Filtering.

File Type-Specific Behavior

CSV

  • Header: Use the first row as column headers.

  • Infer Schema: Automatically detect schema.

JSON

  • Multiline: Enable for multiline JSON records.

  • Charset: Character encoding (UTF-8, ISO-8859-1).

PARQUET

  • No additional configuration required.

AVRO

  • Compression: Options: Snappy (default), Deflate.

  • Compression Level: Available if Deflate is selected (0–9).

XML

  • Root Tag: Root element of the XML.

  • Row Tags: Defines row-level elements.

  • Join Row Tags: Enable to combine multiple row tags.

  • Infer Schema: Automatically detect schema from XML structure.

Column Filtering

The Column Filter section allows selecting specific columns instead of retrieving the entire dataset.

Field
Description
Example

Source Field

Column name from the blob.

customer_id

Destination Field

Alias name for the column.

cust_id

Column Type

Data type of the column.

STRING

Additional Options:

  • Upload File: Upload CSV/JSON/Excel to auto-populate schema.

  • Download Data: Export schema in JSON.

  • Delete Data: Clear schema configuration.

Notes

  • SAS tokens are recommended for temporary access with fine-grained control.

  • Secret Keys grant full control; use only when required and secure in Azure Key Vault.

  • Principal Secret authentication is best for enterprise-scale applications with Azure AD.

  • For JSON and CSV files, schema inference may add processing overhead; consider providing explicit schemas for production workloads.