Pipeline Settings

The Pipeline Settings module in the BDB Platform provides administrators with configuration options to manage scheduling, event tracking, logging, default pipeline settings, system components, and data synchronization.

This module includes the following sections:

Scheduler List

The Scheduler List displays scheduled executions of data pipelines. Administrators can view and manage scheduler details, including execution time and pipeline information.

Access the Scheduler List

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Scheduler List from the context menu.

  3. The Scheduler List page opens, displaying:

    • Scheduler Name

    • Scheduler Time

    • Next Run Time

    • Pipeline Name

By default, the first scheduler’s details open on the right side of the page.

Additional options

  • Search bar: Search for a specific scheduler entry.

  • Refresh icon: Refresh the scheduler list.

Data Channel & Cluster Events

The Data Channel & Cluster Events section provides an overview of Kafka-based pipeline management, including brokers, topics, consumers, and pipeline event details.

Access Data Channel & Cluster Events

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Data Channel & Cluster Events from the context menu.

  3. The page opens with two sections:

    • Data Channel (left panel)

    • Pipeline & Topics (right panel)

Data Channel section

  • Broker Info: Lists Kafka broker instances.

    • A red dot indicates that the broker is down or unreachable.

    • A partition count of 0 indicates Kafka is not actively serving data.

  • Consumer Info: Displays active Kafka consumers and the number of rebalancing operations.

  • Topic Info: Shows the total number of Kafka topics.

  • Version: Displays the Kafka version in use.

Pipeline & Topics section

Displays the list of pipelines with corresponding topic details:

  • Pipeline Name

  • Number of Events

  • Status of Kafka events

  • Active status of the pipeline

Flush or delete pipeline events

  1. Navigate to the Pipeline & Topics list.

  2. Select a pipeline and expand topic details.

  3. At the bottom, choose one of the following:

    • Flush All

    • Delete All

  4. Confirm the action.

  • Flush All and Delete All options are disabled for active Kafka events.

Logger

The Logger section allows administrators to configure logging for system components.

Log types

  • Custom Log: User-defined or system-specific logs.

  • Developer Logs: Backend and developer-centric events.

  • UI Logs: Frontend/UI-related events such as user interactions or errors.

Access and configure Logger

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Logger from the context menu.

  3. Configure the required logger values (e.g., log file, duration in ms).

  4. Click Save.

  5. A notification confirms the logger configuration update.

Default Configuration

Administrators can set default configurations for pipelines and jobs using either Spark or Docker.

Access Default Configuration

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Default Configuration.

  3. The Default Configuration page opens.

  4. Select either Pipeline or Job tab (Pipeline opens by default).

  5. Configure the following options:

    • Engine type: Spark (default) or Docker

    • Resource allocation: Low (default), Medium, or High

    • Processing mode: Batch (default) or Realtime

Spark default configuration

Driver

  • Core: 0.5

  • Core Limit: 2048

  • Memory: 1024 MB

Executor

  • Core: 1

  • Instances: 1

  • Memory: 1024 MB

  • Max Instances: 1

Docker default configuration

Limit (max per instance)

  • Memory: 500 MB

  • CPU: 0.1 vCPU

  • Max Instances: 1

Request (min per instance)

  • Memory: 251 MB

  • CPU: 0.1 vCPU

  • Instances: 1

Click Save to apply settings.

  • Job defaults can be set from the Job tab.

  • Both Spark and Docker defaults can be configured here.

System Component Status

The System Component Status section monitors the health and performance of core services (e.g., Kubernetes pods) supporting pipeline operations.

Access System Component Status

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select System Component Status.

View pod details

The System Pod Details page lists:

  • Name of the pod

  • Status (e.g., Running)

  • Created At timestamp

  • Age since creation

  • Version of the component

  • Restart count

  • CPU (Used/Requested)

  • Memory (Used/Requested)

Use the Refresh option to update pod status.

Data Sync

The Data Sync section allows administrators to create, configure, and manage data synchronization connections.

Create a Data Sync connection

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Data Sync.

  3. Click Create Data Sync Connection.

  4. In the Create Data Sync drawer, provide:

    • Connection Name

    • Host and Port

    • Username/Password

    • Driver (e.g., MongoDB)

    • Connection Type (e.g., Standard)

    • Enable SSL and Certificate Folder (if required)

    • Database Name

    • Additional Params (e.g., authSource=admin)

  5. Click Save.

A success message confirms the new Data Sync creation.

Manage Data Sync connections

  • Connect: Activate a Data Sync. The Connect icon changes to Disconnect.

  • Disconnect: Stop a Data Sync. Confirm in the dialog box.

  • Edit: Open the Edit Data Sync drawer to modify connection settings.

  • Delete: Remove the Data Sync connection.

List Components

The List Components page shows all pipeline components. Administrators can create System or Custom components.

Create a component

  1. Navigate to List Components.

  2. Click Create.

  3. Fill out Basic Information:

    • Name

    • Deployment Type (Spark/Docker)

    • Image Name

    • Version

    • Component Type (System/Custom)

    • Component Group (Readers, Writers, Transformers)

  4. Configure Ports:

    • Port Name

    • Port Number

  5. Configure Spark Component Information (if applicable):

    • Main Class

    • Main Application File

    • Runtime Environment Type

    • Cluster

  6. Click Save.

A success notification confirms creation.

  • Ensure Docker images are created and pushed to the repository. DevOps assistance may be required.

  • For Docker deployment type, only Basic Information is required.

  • Use the View icon to edit existing component configurations.

Job BaseInfo

The Job BaseInfo section defines job templates supported by the Data Engineering module:

  • PySpark Job

  • Spark Job

  • Script Executor

  • Python Job

Job BaseInfo is preconfigured by administrators and should not be created by users.

Create Job BaseInfo

  1. Click Create on the Job BaseInfo page.

  2. Configure Basic Information:

    • Name

    • Deployment type

    • Image Name

    • Version

    • isExposed (auto-filled)

    • Job type

  3. Configure Ports (Add/Delete as needed).

  4. Enter:

    • Main Class

    • Main Application File

    • Runtime Environment Type (Scala, Python, R)

  5. Click Save.

Use the List Job BaseInfo icon to view existing jobs.

Namespace Settings

The Namespace Settings option allows administrators to define logical groupings for pipeline resources. Namespaces provide isolation, improve security, and support multi-project environments.

Configure a namespace

  1. In the Admin menu panel, click Pipeline Settings.

  2. Select Namespace Settings.

  3. Enter values for:

    • Namespace Name (e.g., dev-pipeline)

    • Node Pool key-value pairs

  4. Click Save.

A success notification confirms the namespace configuration.

  • Use the Add and Delete icons to manage key-value pairs.

Last updated