DB Writer
The DB Writer component is a Spark-based writer that allows you to write data to multiple relational and cloud database sources. It supports various save modes including Append, Overwrite, and Upsert, with schema management and column-level filtering.
Configuration Sections
The DB Writer component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Supported Drivers
The DB Writer supports the following databases:
MySQL
Oracle
PostgreSQL
MS-SQL
ClickHouse
Snowflake
Redshift
Note:
The ClickHouse driver in Spark components uses HTTP port instead of TCP port.
For Redshift, Boolean datatypes are not supported in auto table creation. Either provide a
CREATE TABLE
query or convert Boolean fields to String using a column filter.
Meta Information Tab
Database Name
Name of the target database.
analytics_db
Table Name
Target table name for data writes.
customer_orders
Enable SSL
Enable SSL for secure connections. Supported drivers: MongoDB
, PostgreSQL
, ClickHouse
.
true
Certificate Folder
Folder containing uploaded SSL certificates (shown only if SSL enabled).
db_certs
Schema File Name
Upload a Spark schema file (JSON format).
schema.json
Save Mode
Defines write behavior: Append
, Overwrite
, or Upsert
.
Upsert
Sort Column
Available only in Upsert
mode. Defines the column used to identify the latest record when duplicate keys exist.
updated_at
Column Filter
Select specific columns, assign aliases, and define types.
See Column Filtering.
Query
Optional DDL query to create the target table before writing.
CREATE TABLE ...
Save Modes
Append
Inserts new data into the table. In ClickHouse, creates a StripeLog
engine table.
Overwrite
Replaces existing table data with new data.
Upsert
Inserts new records or updates existing records. Requires a Composite Key and optionally a Sort Column. In ClickHouse, creates a ReplacingMergeTree
engine table.
ClickHouse Notes:
Append
(Spark DB Writer) →StripeLog
engine.
Append
(ClickHouse Writer Docker / Data Sync) →Memory
engine.
Upsert
→ReplacingMergeTree
engine.
Sort Column (Upsert Mode)
Required for MSSQL, PostgreSQL, Oracle, Snowflake, and ClickHouse drivers.
If multiple records share the same composite key, the Sort Column determines which record is the latest.
The record with the highest value in the sort column is retained.
Column Filtering
The Column Filter allows selecting only specific columns for write operations.
Name
Name of the source column from upstream data.
customer_id
Alias
Alias for the column in the target table.
cust_id
Column Type
Data type for the column.
STRING
Additional Options:
Upload: Upload CSV/JSON/Excel to auto-populate schema.
Download Data: Export schema in JSON.
Delete Data: Clear schema configuration.
Example Configurations
Example 1: Append Mode to PostgreSQL
Database Name: analytics_db
Table Name: orders
Save Mode: Append
Column Filter:
- Name: order_id
Alias: order_id
Column Type: INT
- Name: order_date
Alias: order_date
Column Type: DATE
Example 2: Upsert Mode in ClickHouse
Database Name: finance_db
Table Name: transactions
Save Mode: Upsert
Composite Key: transaction_id
Sort Column: updated_at
Query: CREATE TABLE IF NOT EXISTS transactions (
transaction_id String,
amount Float64,
updated_at DateTime
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY transaction_id;
Example 3: Redshift Writer with Boolean Conversion
Database Name: warehouse
Table Name: employees
Save Mode: Overwrite
Column Filter:
- Name: is_active
Alias: is_active
Column Type: STRING # Boolean mapped to String
Notes
Always create the table before activating the pipeline to avoid schema mismatches.
Use DDL in Query field for precise control over table schema.
For Boolean datatypes in Redshift, use explicit
CREATE TABLE
or convert toSTRING
.SSL setup requires uploading
.pem
and.key
files in Admin Settings → Certificate Upload.