DB Reader

The DB Reader component is used to read data from supported databases into your pipeline. It supports multiple database drivers and provides options for table-based reads, SQL queries, partitioning, and SSL-enabled connections.

Supported Drivers

  • MySQL

  • Oracle

  • PostgreSQL

  • MS-SQL

  • ClickHouse

  • Snowflake

  • Redshift

Configuration Sections

The DB Reader component includes the following configuration sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

DB Reader Configuration

1. Table or Query Selection

You can either specify one or more table names or directly provide a SQL query. Only one option is required.

  • Table Name

    • Provide a single table name or multiple table names separated by commas.

    • Example:

      employees, departments, salaries
  • Query

    • Enter a Spark SQL query for the given table(s).

    • Joins across multiple tables are supported.

    • Example:

      SELECT e.name, d.department_name 
      FROM employees e 
      JOIN departments d ON e.dept_id = d.id

Note: When using join queries, you do not need to provide table names separately.

2. Fetch Size

  • Defines the maximum number of records processed in a single execution cycle.

  • Use this to control memory usage and improve query performance.

3. Partitioning

Partitioning enhances performance by distributing read operations.

  • Create Partition

    • Creates a sequence of indexing for parallel execution.

    • Once enabled, the query will not run on the server directly.

  • Partition By (available when partitioning is enabled):

    • Auto Increment: The number of partitions is incremented automatically.

    • Index: The number of partitions is based on the specified partition column.

4. SSL Configuration

SSL is supported for PostgreSQL and ClickHouse drivers only.

  • Enable SSL: Check this option to enable SSL.

  • Certificate Folder: Select the folder containing SSL certificates. Certificates must be uploaded through the Admin settings.

Required Files:

  • Certificate file (.pem)

  • Key file (.key)

5. ClickHouse Driver Notes

  • For Spark components, the ClickHouse driver uses HTTP port instead of TCP port.

6. Redshift Driver Notes

  • Redshift configuration follows the same DB Reader setup, with the ability to specify either table names or queries.

Example Queries

  • Basic Filter Query

    SELECT * FROM employee WHERE gender = 'Male';

    Retrieves all employee records where gender is Male.

  • Join Query Across Two Tables

    SELECT e.name, s.salary
    FROM employee e
    JOIN salary s ON e.emp_id = s.emp_id

    Performs a join between the employee and salary tables.