SFTP Stream Reader
The SFTP Stream Reader component allows you to connect to an SFTP server and read data streams in real time. It supports authentication using either username and password or SSH key-based authentication and can process multiple file formats such as CSV, JSON, and XML.
Configuration Sections
The SFTP Stream Reader component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Meta Information Configuration
Host
Hostname or IP address of the SFTP server.
sftp.example.com
Yes
Port
Port number for the SFTP server.
22
Yes
Username
Username for authentication.
datauser
Yes
Authentication
Authentication type. Options: Password
or PEM/PPK File
.
Password
Yes
Password
Password for authentication (if Password is selected).
********
Conditional
PEM/PPK File
SSH key file for authentication (if PEM/PPK is selected). Must be uploaded via UI.
id_rsa.ppk
Conditional
Reader Path
Directory path on the SFTP server where the file(s) are located.
/data/incoming/
Yes
Channel
Streaming channel type. Only SFTP
is supported.
SFTP
Yes
Add File Name
Include the file name in the output data.
true
Optional
File Type
Type of file being read. Options: CSV
, JSON
, XML
.
CSV
Yes
File Metadata Topic
Kafka event name where file metadata will be sent.
sftp_metadata_event
Optional
Column Filter
Select specific columns to read. Optionally provide an alias and column type.
id AS employee_id, name STRING
Optional
File Type Configuration
CSV
When CSV is selected as the file type:
Header: Enable to use the first row as column headers.
Infer Schema: Enable to automatically detect the schema.
Schema: Optionally paste a Spark schema definition for the CSV.
Example Schema:
{
"fields": [
{"name": "id", "type": "integer"},
{"name": "name", "type": "string"},
{"name": "salary", "type": "double"}
]
}
JSON
When JSON is selected as the file type:
Multiline: Enable if the JSON file contains multiline records.
Charset: Specify the character encoding (e.g., UTF-8).
XML
When XML is selected as the file type, the following options are available:
Infer Schema: Enable to automatically detect column schema.
Path: Path of the XML file.
Root Tag: The root tag of the XML file.
Row Tags: The tag identifying rows/documents in the XML.
Join Row Tags: Enable to join multiple row tags.
File Metadata Handling
File Metadata Topic: Sends metadata (such as file name, size, timestamp) to a Kafka event stream for downstream processing.
Uploading and Downloading Schema
Upload File: Upload system files (CSV or JSON) for quick schema configuration.
Download Data (Schema): Download the schema structure in JSON format.
Example Configuration
Reading a CSV File
Host: sftp.example.com
Port: 22
Username: datauser
Authentication: Password
Password: ********
Reader Path: /data/incoming/sales.csv
File Type: CSV
Header: true
Infer Schema: true
Reading an XML File
Host: sftp.example.com
Port: 22
Username: datauser
Authentication: PEM/PPK
PEM/PPK File: id_rsa.ppk
Reader Path: /data/xml/invoices.xml
File Type: XML
Root Tag: invoices
Row Tags: invoice
Join Row Tags: false
Notes
Use Password authentication for quick setup; use PEM/PPK File for enhanced security.
Schema inference may impact performance for very large files; for production workloads, provide a schema explicitly.
The File Metadata Topic integration with Kafka allows real-time monitoring and downstream triggers.