Cassandra Writer

The Cassandra Writer provides configurable connection options, batch settings, column mapping, and consistency controls to ensure reliable write operations into the target Cassandra table.

The Cassandra Writer is a high-performance, flexible component for writing data from BDB Data Pipelines into Apache Cassandra. Its support for batch optimization, column filtering, consistency control, and UUID generation makes it suitable for enterprise-scale real-time pipelines and operational analytics workloads.

This component is essential for organizations that rely on Cassandra as a backend for scalable, fault-tolerant data applications.

Overview

The Cassandra Writer component:

Writes pipeline output rows into a Cassandra table
Supports configurable consistency levels
Allows custom column mapping and type conversion
Enables batch-optimized writes for performance
Supports UUID generation for primary-key creation
Allows partitioning and filtering configurations

It is designed for production workloads that require high-throughput ingestion into Cassandra clusters.

Component Placement

The component is available under:

Data Engineering → Pipelines → Components → Writers

When placed on the pipeline canvas, selecting the node displays two tabs:

Basic Information
Meta Information

The screenshot provided corresponds to the Meta Information configuration.

Basic Information Tab

This tab contains:

Component Name
Description (optional)

These fields help with identifying and documenting the writer in multi-step workflows.

Meta Information Tab

The Meta Information tab contains all configuration fields required to connect to the Cassandra cluster and define how the writer should behave.

Connection Parameters

Field

Description

Host IP Address*

IP address or hostname of a Cassandra cluster contact point.

Port*

Port for Cassandra's native protocol. Default is 9042.

Keyspace*

The keyspace in which the target table resides.

Table*

The Cassandra table into which data will be written.

Username*

Username for Cassandra authentication.

Password*

Password for the provided username.

Cluster

Optional descriptive name for the Cassandra cluster.

Compression Method

Compression algorithm used during communication (e.g., Snappy, LZ4).

Write Behavior Settings

Field

Description

Consistency

Determines the required number of replica acknowledgements (e.g., ONE, QUORUM, ALL). Impacts durability and performance.

No. of Rows Per Batch

Defines the number of rows written per batch request. Larger batches increase throughput but require more memory.

UUID Column Name

Specifies a column where auto-generated UUID values should be inserted. Useful when generating primary keys dynamically.

Column Filter Section

This section enables users to map pipeline fields to Cassandra columns and manage data types.

Column Filter Fields

Field

Description

Name

Name of the input column in the pipeline output.

Alias Name

Optional rename for the Cassandra target column.

Column Type

Data type to use when writing to Cassandra (String, Int, UUID, Boolean, etc.).

Users can click Add New Column to define each mapping.

This is particularly useful when:

Pipeline outputs contain more columns than needed
Columns need renaming
Data type normalization is required

Partition Columns

This section is used to define which incoming fields map to the partition key of the Cassandra table.

Partition keys determine:

Data distribution across nodes
Read/write routing
Clustering behavior

Note: Incorrect partition mapping can affect performance and data locality.

Execution Behavior

When the pipeline executes:

The Cassandra Writer establishes a session with the target cluster.
For each incoming row or batch:
- Column mappings are applied
- UUID values are generated (if configured)
- Data is cast to the appropriate Cassandra types
Write operations are performed according to:
- Batch size
- Consistency level
- Compression settings
Successful writes return acknowledgments; failures are logged.

If a consistency-level requirement is not met, Cassandra returns an error, and the pipeline marks the component as Failed.

Supported Write Modes

The Cassandra Writer supports:

Insert operations
Upsert-like writes (default Cassandra behavior when primary key matches)
UUID generation for primary key creation
Batch execution for higher throughput

Operations not supported:

DELETE
UPDATE (beyond Cassandra’s natural upsert semantics)
ALTER TABLE

Common Use Cases

Real-Time Feature Storage

Store machine learning features used by low-latency inference systems.

Operational Analytics Updating

Write enriched analytics back into Cassandra for application dashboards.

Event Stream Persistence

Persist streaming data (e.g., clickstream, IoT telemetry) into Cassandra for real-time querying.

ETL Pipelines

Load transformed data from Data Lake or relational sources into Cassandra tables.

Best Practices

Connection & Cluster Settings

Utilize multiple IP addresses or load-balanced endpoints for production-grade availability.
Ensure network/firewall rules allow communication on port 9042.

Batch Size Optimization

Start with moderate batch sizes (100–500 rows).
Increase cautiously to balance performance vs. memory overhead.

Consistency Level Selection

ONE: fastest writes, lowest durability
QUORUM: balanced durability and latency
ALL: strongest consistency, highest latency

Choose based on application durability needs.

Column Mapping Hygiene

Map only required fields.
Ensure data types match the Cassandra schema to avoid write-time errors.

UUID Usage

Auto-generate UUIDs for surrogate keys in append-only workloads.

Troubleshooting Guide

Issue

Root Cause

Recommended Action

Authentication failure

Wrong username or password

Verify credentials and Cassandra authentication mode.

Write timeout

Batch too large, slow replicas

Reduce batch size or adjust Cassandra timeout settings.

UnavailableException

Not enough replicas for chosen consistency

Lower consistency level (e.g., QUORUM → ONE).

Mapping mismatch

Column type incompatible

Validate schema and adjust column type mapping.

Last updated 1 month ago

hashtagOverview

hashtagComponent Placement

hashtagBasic Information Tab

hashtagMeta Information Tab

hashtagConnection Parameters

hashtagWrite Behavior Settings

hashtagColumn Filter Section

hashtagColumn Filter Fields

hashtagPartition Columns

hashtagExecution Behavior

hashtagSupported Write Modes

hashtagCommon Use Cases

hashtagReal-Time Feature Storage

hashtagOperational Analytics Updating

hashtagEvent Stream Persistence

hashtagETL Pipelines

hashtagBest Practices

hashtagConnection & Cluster Settings

hashtagBatch Size Optimization

hashtagConsistency Level Selection

hashtagColumn Mapping Hygiene

hashtagUUID Usage

hashtagTroubleshooting Guide

Overview

Component Placement

Basic Information Tab

Meta Information Tab

Connection Parameters

Write Behavior Settings

Column Filter Section

Column Filter Fields

Partition Columns

Execution Behavior

Supported Write Modes

Common Use Cases

Real-Time Feature Storage

Operational Analytics Updating

Event Stream Persistence

ETL Pipelines

Best Practices

Connection & Cluster Settings

Batch Size Optimization

Consistency Level Selection

Column Mapping Hygiene

UUID Usage

Troubleshooting Guide