DB Reader
The DB Reader component is used to read data from supported databases into your pipeline. It supports multiple database drivers and provides options for table-based reads, SQL queries, partitioning, and SSL-enabled connections.
Supported Drivers
MySQL
Oracle
PostgreSQL
MS-SQL
ClickHouse
Snowflake
Redshift
Configuration Sections
The DB Reader component includes the following configuration sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
DB Reader Configuration
1. Table or Query Selection
You can either specify one or more table names or directly provide a SQL query. Only one option is required.
Table Name
Provide a single table name or multiple table names separated by commas.
Example:
employees, departments, salaries
Query
Enter a Spark SQL query for the given table(s).
Joins across multiple tables are supported.
Example:
SELECT e.name, d.department_name FROM employees e JOIN departments d ON e.dept_id = d.id
Note: When using join queries, you do not need to provide table names separately.
2. Fetch Size
Defines the maximum number of records processed in a single execution cycle.
Use this to control memory usage and improve query performance.
3. Partitioning
Partitioning enhances performance by distributing read operations.
Create Partition
Creates a sequence of indexing for parallel execution.
Once enabled, the query will not run on the server directly.
Partition By (available when partitioning is enabled):
Auto Increment: The number of partitions is incremented automatically.
Index: The number of partitions is based on the specified partition column.
4. SSL Configuration
SSL is supported for PostgreSQL and ClickHouse drivers only.
Enable SSL: Check this option to enable SSL.
Certificate Folder: Select the folder containing SSL certificates. Certificates must be uploaded through the Admin settings.
Required Files:
Certificate file (
.pem
)Key file (
.key
)
5. ClickHouse Driver Notes
For Spark components, the ClickHouse driver uses HTTP port instead of TCP port.
6. Redshift Driver Notes
Redshift configuration follows the same DB Reader setup, with the ability to specify either table names or queries.
Example Queries
Basic Filter Query
SELECT * FROM employee WHERE gender = 'Male';
Retrieves all employee records where gender is Male.
Join Query Across Two Tables
SELECT e.name, s.salary FROM employee e JOIN salary s ON e.emp_id = s.emp_id
Performs a join between the
employee
andsalary
tables.