DB Writer

The DB Writer component is a Spark-based writer that allows you to write data to multiple relational and cloud database sources. It supports various save modes including Append, Overwrite, and Upsert, with schema management and column-level filtering.

Configuration Sections

The DB Writer component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Supported Drivers

The DB Writer supports the following databases:

  • MySQL

  • Oracle

  • PostgreSQL

  • MS-SQL

  • ClickHouse

  • Snowflake

  • Redshift

Note:

  • The ClickHouse driver in Spark components uses HTTP port instead of TCP port.

  • For Redshift, Boolean datatypes are not supported in auto table creation. Either provide a CREATE TABLE query or convert Boolean fields to String using a column filter.

Meta Information Tab

Parameter
Description
Example

Database Name

Name of the target database.

analytics_db

Table Name

Target table name for data writes.

customer_orders

Enable SSL

Enable SSL for secure connections. Supported drivers: MongoDB, PostgreSQL, ClickHouse.

true

Certificate Folder

Folder containing uploaded SSL certificates (shown only if SSL enabled).

db_certs

Schema File Name

Upload a Spark schema file (JSON format).

schema.json

Save Mode

Defines write behavior: Append, Overwrite, or Upsert.

Upsert

Sort Column

Available only in Upsert mode. Defines the column used to identify the latest record when duplicate keys exist.

updated_at

Column Filter

Select specific columns, assign aliases, and define types.

See Column Filtering.

Query

Optional DDL query to create the target table before writing.

CREATE TABLE ...

Save Modes

Save Mode
Behavior

Append

Inserts new data into the table. In ClickHouse, creates a StripeLog engine table.

Overwrite

Replaces existing table data with new data.

Upsert

Inserts new records or updates existing records. Requires a Composite Key and optionally a Sort Column. In ClickHouse, creates a ReplacingMergeTree engine table.

ClickHouse Notes:

  • Append (Spark DB Writer) → StripeLog engine.

  • Append (ClickHouse Writer Docker / Data Sync) → Memory engine.

  • UpsertReplacingMergeTree engine.

Sort Column (Upsert Mode)

  • Required for MSSQL, PostgreSQL, Oracle, Snowflake, and ClickHouse drivers.

  • If multiple records share the same composite key, the Sort Column determines which record is the latest.

  • The record with the highest value in the sort column is retained.

Column Filtering

The Column Filter allows selecting only specific columns for write operations.

Field
Description
Example

Name

Name of the source column from upstream data.

customer_id

Alias

Alias for the column in the target table.

cust_id

Column Type

Data type for the column.

STRING

Additional Options:

  • Upload: Upload CSV/JSON/Excel to auto-populate schema.

  • Download Data: Export schema in JSON.

  • Delete Data: Clear schema configuration.

Example Configurations

Example 1: Append Mode to PostgreSQL

Database Name: analytics_db
Table Name: orders
Save Mode: Append
Column Filter:
  - Name: order_id
    Alias: order_id
    Column Type: INT
  - Name: order_date
    Alias: order_date
    Column Type: DATE

Example 2: Upsert Mode in ClickHouse

Database Name: finance_db
Table Name: transactions
Save Mode: Upsert
Composite Key: transaction_id
Sort Column: updated_at
Query: CREATE TABLE IF NOT EXISTS transactions (
          transaction_id String,
          amount Float64,
          updated_at DateTime
       )
       ENGINE = ReplacingMergeTree(updated_at)
       ORDER BY transaction_id;

Example 3: Redshift Writer with Boolean Conversion

Database Name: warehouse
Table Name: employees
Save Mode: Overwrite
Column Filter:
  - Name: is_active
    Alias: is_active
    Column Type: STRING  # Boolean mapped to String

Notes

  • Always create the table before activating the pipeline to avoid schema mismatches.

  • Use DDL in Query field for precise control over table schema.

  • For Boolean datatypes in Redshift, use explicit CREATE TABLE or convert to STRING.

  • SSL setup requires uploading .pem and .key files in Admin Settings → Certificate Upload.