Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • List Pipelines
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Job Editor Page
          • Task Components
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
      • List Jobs
      • List Components
      • Delete Orphan Pods
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push Pipeline (to VCS/GIT)
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Pipeline Monitoring
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Monitoring Pipeline
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Drivers Available
  • Save Modes
  1. Components
  2. Writers

DB Writer

PreviousS3 WriterNextHDFS Writer

Last updated 1 year ago

The DB reader is a spark-based writer component which gives you capability to write data to multiple database sources.

All component configurations are classified broadly into the following sections:

  • ​​

  • Meta Information

  • ​​

  • ​​

Please check out the given demonstration to configure the component.

Drivers Available

Supported Drivers
  • MySQL

  • Oracle

  • PostgreSQL

  • MS-SQL

  • ClickHouse

  • Snowflake

  • Redshift

Please Note:

  • The ClickHouse driver in the Spark components will use the HTTP Port and not the TCP port.

  • It is always recommended to create the table before activating the pipeline to avoid errors as RDBMS has a strict schema and can result in errors.

  • When using the Redshift driver with a Boolean datatype in JDBC, the table is not created unless you pass the create table query. Alternatively, you can use a column filter to convert a Boolean value to a String for the desired operation.

Save Modes

The RDBMS writer supports 3 save modes:

Append

As the name suggests it adds all the records without any validations.

Overwrite

This mode truncates the table and adds fresh records. after every run you will get records that are part of the batch process.

Upsert

This operation allows the users to insert a new record or update existing data into a table. For configuring this we need to provide the Composite Key.

The BDB Data Pipeline supports composite key based upsert, in case of composite key, we can specify the second key by using comma separator e.g., key1, key2​. It has now an option to upload the spark schema in JSON format. This can greatly improve the speed of the write operation as the component will ignore inferring schema and go with the provided schema.

  • Sort Column: This field will appear only when Upsert is selected as Save mode. If there are multiple records with the same composite key but different values in the batch, the system identifies the record with the latest value based on the Sort column. The Sort column defines the ordering of records, and the record with the highest value in the sort column is considered the latest.

Please Note:

  • Currently, the Sort column field is only available for the following drivers in the DB Writer: MSSQL, PostgreSQL, Oracle, Snowflake, and ClickHouse.

  • Database name: Enter the Database name.

  • Table name: Provide a table name where the data has to be written.

  • Enable SSL: Check this box to enable SSL for this components. Enable SSL feature in DB reader component will appear only for three(3) drivers: MongoDB, PostgreSQL and ClickHouse.

  • Certificate Folder: This option will appear when the Enable SSL field is checked-in. The user has to select the certificate folder from drop down which contains the files which has been uploaded to the admin settings. Please refer the below given images for the reference.

  • Schema File Name: Upload the Spark Schema in JSON format.

  • Query: In this field, we can write a DDL for creating the table in database where the in-event data has to be written. For example, please refer the below image:

Please Note:

In DB Writer component, the Save Mode for ClickHouse driver is as follows:

  • Append: It will create a table in ClickHouse database with a table engine Stripelog.

  • Upsert: It will create a table in ClickHouse database with a table engine ReplacingMergeTree.

​Basic Information​
Resource Configuration​
Connection Validation
DB Writer configuration for ClickHouse driver
Configuring DB Writer component for Redshift Driver
Meta information tab for DB Writer
Sort Column in Upsert Mode for DB Writer (ClickHouse driver)
Writing a DDL for creating a table in DB.
Meta Information fields of the DB writer (with SSL enabled)