Azure Blob Reader
The Azure Blob Storage Reader component reads data files from an Azure Blob container. It supports multiple authentication methods, including Shared Key and Service Principal (Client Secret) authentication. Files can be read in multiple formats such as CSV, JSON, Parquet, Avro, and XML.
Configuration Sections
The component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Authentication Methods
1. Shared Key Authorization
Authenticate using the Account Key and Account Name of your Azure storage account.
Account Key
Key used to authorize access to the storage account.
xxxxx12345...
Yes
Account Name
Name of the Azure storage account.
myazureaccount
Yes
Container
Name of the container containing the files.
sales-data
Yes
2. Service Principal (Client Secret)
Authenticate using an Azure AD application (service principal).
Client ID
Application (client) ID assigned by Azure AD.
2c76b0a9-xxxx-xxxx-xxxx-abcdef
Yes
Tenant ID
Globally unique identifier (GUID) of your tenant.
72f988bf-xxxx-xxxx-xxxx-abcdef
Yes
Client Secret
Secret key (password) of the service principal.
********
Yes
Account Name
Name of the Azure storage account.
myazureaccount
Yes
Container
Container containing the files.
finance-data
Yes
File Options
File Type
Supported file formats: CSV
, JSON
, PARQUET
, AVRO
, XML
.
CSV
Yes
Read Directory
If enabled, reads all blobs in the container.
true
(default)
Yes
Blob Name
Specific blob to read (appears only if Read Directory is disabled).
2025_sales.csv
Conditional
Limit
Maximum number of records to read.
1000
Optional
Query
Spark SQL query. Use inputDf
as the table name.
SELECT * FROM inputDf
Optional
Column Filtering
You can specify and rename columns before ingestion.
Source Field
Name of the column in the blob.
customer_id
Destination Field
Alias for the column name.
cust_id
Column Type
Data type of the column.
STRING
Additional Actions:
Upload: Upload a file (CSV, JSON, Excel). Column names are auto-populated.
Download Data: Download column filter details in JSON format.
Delete Data: Clear all column filter details.
File Type-Specific Configurations
CSV
Header: Enable if the first row contains column headers.
Infer Schema: Enable automatic schema detection.
JSON
Multiline: Enable if JSON records span multiple lines.
Charset: Define encoding (e.g.,
UTF-8
,ISO-8859-1
).
PARQUET
No additional fields required.
AVRO
Compression: Select
Deflate
orSnappy
(default).Compression Level: Available if
Deflate
is selected. Choose 0–9, where higher values increase compression.
XML
Path: Path of the XML file.
Root Tag: Root element of the XML.
Row Tags: Tag identifying row-level elements.
Join Row Tags: Enable to combine multiple row tags.
Example Configurations
Example 1: CSV File with Shared Key
Authentication: Shared Key
Account Name: myazureaccount
Account Key: ********
Container: sales-data
File Type: CSV
Read Directory: false
Blob Name: 2025_sales.csv
Header: true
Infer Schema: true
Query: SELECT customer_id, amount FROM inputDf WHERE amount > 1000
Example 2: JSON File with Principal Secret
Authentication: Service Principal
Client ID: 2c76b0a9-xxxx-xxxx-xxxx-abcdef
Tenant ID: 72f988bf-xxxx-xxxx-xxxx-abcdef
Client Secret: ********
Account Name: myazureaccount
Container: logs
File Type: JSON
Multiline: true
Charset: UTF-8
Read Directory: true
Limit: 5000
Query: SELECT event_type, timestamp FROM inputDf WHERE event_type = 'ERROR'
Notes
Shared Key authentication is simpler but less secure than Service Principal. For production, prefer Service Principal.
Use Limit when working with very large datasets to avoid memory issues.
Schema inference may add overhead; provide schema definitions for large/complex files.
Snappy is the default Avro compression type, optimized for speed.