SFTP Stream Reader

The SFTP Stream Reader component allows you to connect to an SFTP server and read data streams in real time. It supports authentication using either username and password or SSH key-based authentication and can process multiple file formats such as CSV, JSON, and XML.

Configuration Sections

The SFTP Stream Reader component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Meta Information Configuration

Parameter
Description
Example
Required

Host

Hostname or IP address of the SFTP server.

sftp.example.com

Yes

Port

Port number for the SFTP server.

22

Yes

Username

Username for authentication.

datauser

Yes

Authentication

Authentication type. Options: Password or PEM/PPK File.

Password

Yes

Password

Password for authentication (if Password is selected).

********

Conditional

PEM/PPK File

SSH key file for authentication (if PEM/PPK is selected). Must be uploaded via UI.

id_rsa.ppk

Conditional

Reader Path

Directory path on the SFTP server where the file(s) are located.

/data/incoming/

Yes

Channel

Streaming channel type. Only SFTP is supported.

SFTP

Yes

Add File Name

Include the file name in the output data.

true

Optional

File Type

Type of file being read. Options: CSV, JSON, XML.

CSV

Yes

File Metadata Topic

Kafka event name where file metadata will be sent.

sftp_metadata_event

Optional

Column Filter

Select specific columns to read. Optionally provide an alias and column type.

id AS employee_id, name STRING

Optional

File Type Configuration

CSV

When CSV is selected as the file type:

  • Header: Enable to use the first row as column headers.

  • Infer Schema: Enable to automatically detect the schema.

  • Schema: Optionally paste a Spark schema definition for the CSV.

Example Schema:

{
  "fields": [
    {"name": "id", "type": "integer"},
    {"name": "name", "type": "string"},
    {"name": "salary", "type": "double"}
  ]
}

JSON

When JSON is selected as the file type:

  • Multiline: Enable if the JSON file contains multiline records.

  • Charset: Specify the character encoding (e.g., UTF-8).

XML

When XML is selected as the file type, the following options are available:

  • Infer Schema: Enable to automatically detect column schema.

  • Path: Path of the XML file.

  • Root Tag: The root tag of the XML file.

  • Row Tags: The tag identifying rows/documents in the XML.

  • Join Row Tags: Enable to join multiple row tags.

File Metadata Handling

  • File Metadata Topic: Sends metadata (such as file name, size, timestamp) to a Kafka event stream for downstream processing.

Uploading and Downloading Schema

  • Upload File: Upload system files (CSV or JSON) for quick schema configuration.

  • Download Data (Schema): Download the schema structure in JSON format.

Example Configuration

Reading a CSV File

Host: sftp.example.com
Port: 22
Username: datauser
Authentication: Password
Password: ********
Reader Path: /data/incoming/sales.csv
File Type: CSV
Header: true
Infer Schema: true

Reading an XML File

Host: sftp.example.com
Port: 22
Username: datauser
Authentication: PEM/PPK
PEM/PPK File: id_rsa.ppk
Reader Path: /data/xml/invoices.xml
File Type: XML
Root Tag: invoices
Row Tags: invoice
Join Row Tags: false

Notes

  • Use Password authentication for quick setup; use PEM/PPK File for enhanced security.

  • Schema inference may impact performance for very large files; for production workloads, provide a schema explicitly.

  • The File Metadata Topic integration with Kafka allows real-time monitoring and downstream triggers.