ES Writer
The ES Writer task is used to write data into an Elasticsearch cluster. Elasticsearch is an open-source, distributed search and analytics engine built on Apache Lucene. It is designed for real-time indexing, search, and analysis of structured, semi-structured, and unstructured data at scale.
Configuring the Meta Information Tab
To configure the ES Writer task:
Drag the ES Writer task to the Workspace.
Click on the task to open its configuration tabs.
The Meta Information tab opens by default. Configure the following fields:
Host IP Address
Enter the host IP address of the Elasticsearch node.
Port
Enter the port number used to connect to Elasticsearch (default: 9200
).
Index ID
Provide the Index ID where the data should be written. An index in Elasticsearch is a collection of documents with similar characteristics. Each document within an index has a unique identifier.
Mapping ID
Provide the Mapping ID. A mapping defines the schema of documents in an index. The mapping ID uniquely identifies the mapping definition, controls field data types, and determines how documents are indexed and queried.
Resource Type
Specify the resource type. Resource types logically group related documents within an index and are defined at index creation.
Username
Enter the Elasticsearch username for authentication.
Password
Enter the Elasticsearch password for authentication.
Schema File Name
Upload a Spark schema file in JSON format to enforce the schema of the data being written.
Save Mode
Select the save mode from the drop-down: <ul><li>Append — Add data to the existing index.</li></ul>
Selected Columns
Choose specific columns to write. You can also assign alias names and define the desired data type for each selected column.
Save & Next Steps
Once configuration is complete, click Save Task In Storage to persist your settings.
Validate the connection and schema by running a test write.
Monitor task logs to verify successful indexing.