ES Reader
The Elasticsearch Reader (ES Reader) component allows you to read and query data stored in an Elasticsearch index. It connects to the Elasticsearch cluster using username and password authentication and retrieves documents for use in your pipeline.
Configuration Sections
The ES Reader component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Meta Information Configuration
Host IP Address
IP address of the Elasticsearch host.
192.168.1.10
Yes
Port
Port number to connect to Elasticsearch.
9200
Yes
Index ID
The index ID (unique identifier for a document). An index groups documents, and each document has a unique ID.
employee_001
Yes
Resource Type
Logical grouping of related documents within an index, defined during index creation.
employee
, department
Yes
Is Date Rich
Enable if fields contain date/time information. Allows advanced queries such as range filtering and date arithmetic.
true
No
Username
Username for authentication.
elastic
Yes
Password
Password for authentication.
******
Yes
Query
Spark SQL query used to retrieve data from the index. Supports advanced queries and filters.
See example below
Yes
Example Usage
1. Retrieve All Documents from an Index
SELECT * FROM employee_index;
Fetches all documents stored in the employee_index
.
2. Filter Documents by Date Range
SELECT * FROM logs
WHERE timestamp BETWEEN '2025-01-01' AND '2025-01-31';
Retrieves all log entries for January 2025 using the Is Date Rich feature.
3. Search by Index ID
SELECT * FROM employee_index WHERE _id = 'emp_102';
Fetches a specific document by its unique index ID.
4. Advanced Query Example
SELECT name, department, hire_date
FROM employee_index
WHERE department = 'Engineering' AND hire_date > '2022-01-01';
Returns employees in the Engineering department hired after January 2022.
Notes
Authentication is required for all connections. Ensure that valid username and password credentials are provided.
The Is Date Rich option must be enabled if the dataset includes date/time fields to allow date-based filtering and arithmetic operations.
Each document within an Elasticsearch index has a unique index ID, automatically generated by Elasticsearch.