ES Writer

The Elasticsearch Writer component writes data into an Elasticsearch index. It supports column filtering, schema enforcement, and configurable save modes. Authentication is managed through username and password credentials, which provide access to the target Elasticsearch cluster.

Configuration Sections

The Elasticsearch Writer configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Meta Information Tab

Parameter
Description
Example

Host IP Address

Host IP address of the Elasticsearch cluster.

192.168.1.25

Port

Port number for Elasticsearch.

9200

Index ID

Target index ID where data will be written. Each document in Elasticsearch is uniquely identified within the index.

employee_index

Mapping ID

Unique identifier for a mapping definition. Defines the schema of documents in the index.

employee_mapping

Resource Type

Groups related documents logically within an index.

employee

Username

Username for Elasticsearch authentication.

elastic

Password

Password for Elasticsearch authentication.

********

Schema File Name

Upload a Spark schema file in JSON format to define data schema.

employee_schema.json

Save Mode

Defines how data is written to the index. Options: Append.

Append

Selected Columns

Choose specific columns to write. Optionally define alias names and data types for each.

emp_id AS employee_id (STRING)

Save Mode

  • Append: Inserts new documents into the specified Elasticsearch index.

⚠️ Note: The ES Writer does not support Overwrite or Upsert save modes. For updates or deletes, use Elasticsearch APIs or connectors that support document updates.

Column Filtering

The Selected Columns section allows controlling which fields are written into the index.

Field
Description
Example

Column Name

Source column from input data.

emp_id

Alias

Target column name in Elasticsearch.

employee_id

Column Type

Data type of the field (e.g., STRING, INT, DATE).

STRING

Notes

  • Ensure that the index and mapping are created in Elasticsearch before activating the pipeline.

  • Schema mismatches between Spark schema and Elasticsearch mapping may result in rejected documents.

  • Use column filtering to limit the data written and enforce consistency with the mapping.