Azure Blob Reader

The Azure Blob Storage Reader component reads data files from an Azure Blob container. It supports multiple authentication methods, including Shared Key and Service Principal (Client Secret) authentication. Files can be read in multiple formats such as CSV, JSON, Parquet, Avro, and XML.

Configuration Sections

The component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Authentication Methods

1. Shared Key Authorization

Authenticate using the Account Key and Account Name of your Azure storage account.

Parameter
Description
Example
Required

Account Key

Key used to authorize access to the storage account.

xxxxx12345...

Yes

Account Name

Name of the Azure storage account.

myazureaccount

Yes

Container

Name of the container containing the files.

sales-data

Yes

2. Service Principal (Client Secret)

Authenticate using an Azure AD application (service principal).

Parameter
Description
Example
Required

Client ID

Application (client) ID assigned by Azure AD.

2c76b0a9-xxxx-xxxx-xxxx-abcdef

Yes

Tenant ID

Globally unique identifier (GUID) of your tenant.

72f988bf-xxxx-xxxx-xxxx-abcdef

Yes

Client Secret

Secret key (password) of the service principal.

********

Yes

Account Name

Name of the Azure storage account.

myazureaccount

Yes

Container

Container containing the files.

finance-data

Yes

File Options

Parameter
Description
Example
Required

File Type

Supported file formats: CSV, JSON, PARQUET, AVRO, XML.

CSV

Yes

Read Directory

If enabled, reads all blobs in the container.

true (default)

Yes

Blob Name

Specific blob to read (appears only if Read Directory is disabled).

2025_sales.csv

Conditional

Limit

Maximum number of records to read.

1000

Optional

Query

Spark SQL query. Use inputDf as the table name.

SELECT * FROM inputDf

Optional

Column Filtering

You can specify and rename columns before ingestion.

Field
Description
Example

Source Field

Name of the column in the blob.

customer_id

Destination Field

Alias for the column name.

cust_id

Column Type

Data type of the column.

STRING

Additional Actions:

  • Upload: Upload a file (CSV, JSON, Excel). Column names are auto-populated.

  • Download Data: Download column filter details in JSON format.

  • Delete Data: Clear all column filter details.

File Type-Specific Configurations

CSV

  • Header: Enable if the first row contains column headers.

  • Infer Schema: Enable automatic schema detection.

JSON

  • Multiline: Enable if JSON records span multiple lines.

  • Charset: Define encoding (e.g., UTF-8, ISO-8859-1).

PARQUET

  • No additional fields required.

AVRO

  • Compression: Select Deflate or Snappy (default).

  • Compression Level: Available if Deflate is selected. Choose 0–9, where higher values increase compression.

XML

  • Path: Path of the XML file.

  • Root Tag: Root element of the XML.

  • Row Tags: Tag identifying row-level elements.

  • Join Row Tags: Enable to combine multiple row tags.

Example Configurations

Example 1: CSV File with Shared Key

Authentication: Shared Key
Account Name: myazureaccount
Account Key: ********
Container: sales-data
File Type: CSV
Read Directory: false
Blob Name: 2025_sales.csv
Header: true
Infer Schema: true
Query: SELECT customer_id, amount FROM inputDf WHERE amount > 1000

Example 2: JSON File with Principal Secret

Authentication: Service Principal
Client ID: 2c76b0a9-xxxx-xxxx-xxxx-abcdef
Tenant ID: 72f988bf-xxxx-xxxx-xxxx-abcdef
Client Secret: ********
Account Name: myazureaccount
Container: logs
File Type: JSON
Multiline: true
Charset: UTF-8
Read Directory: true
Limit: 5000
Query: SELECT event_type, timestamp FROM inputDf WHERE event_type = 'ERROR'

Notes

  • Shared Key authentication is simpler but less secure than Service Principal. For production, prefer Service Principal.

  • Use Limit when working with very large datasets to avoid memory issues.

  • Schema inference may add overhead; provide schema definitions for large/complex files.

  • Snappy is the default Avro compression type, optimized for speed.