Cassandra Writer

The Cassandra Writer provides configurable connection options, batch settings, column mapping, and consistency controls to ensure reliable write operations into the target Cassandra table.

The Cassandra Writer is a high-performance, flexible component for writing data from BDB Data Pipelines into Apache Cassandra. Its support for batch optimization, column filtering, consistency control, and UUID generation makes it suitable for enterprise-scale real-time pipelines and operational analytics workloads.

This component is essential for organizations that rely on Cassandra as a backend for scalable, fault-tolerant data applications.

Overview

The Cassandra Writer component:

  • Writes pipeline output rows into a Cassandra table

  • Supports configurable consistency levels

  • Allows custom column mapping and type conversion

  • Enables batch-optimized writes for performance

  • Supports UUID generation for primary-key creation

  • Allows partitioning and filtering configurations

It is designed for production workloads that require high-throughput ingestion into Cassandra clusters.

Component Placement

The component is available under:

Data Engineering → Pipelines → Components → Writers

When placed on the pipeline canvas, selecting the node displays two tabs:

  • Basic Information

  • Meta Information

The screenshot provided corresponds to the Meta Information configuration.

Basic Information Tab

This tab contains:

  • Component Name

  • Description (optional)

These fields help with identifying and documenting the writer in multi-step workflows.

Meta Information Tab

The Meta Information tab contains all configuration fields required to connect to the Cassandra cluster and define how the writer should behave.

Connection Parameters

Field
Description

Host IP Address*

IP address or hostname of a Cassandra cluster contact point.

Port*

Port for Cassandra's native protocol. Default is 9042.

Keyspace*

The keyspace in which the target table resides.

Table*

The Cassandra table into which data will be written.

Username*

Username for Cassandra authentication.

Password*

Password for the provided username.

Cluster

Optional descriptive name for the Cassandra cluster.

Compression Method

Compression algorithm used during communication (e.g., Snappy, LZ4).

Write Behavior Settings

Field
Description

Consistency

Determines the required number of replica acknowledgements (e.g., ONE, QUORUM, ALL). Impacts durability and performance.

No. of Rows Per Batch

Defines the number of rows written per batch request. Larger batches increase throughput but require more memory.

UUID Column Name

Specifies a column where auto-generated UUID values should be inserted. Useful when generating primary keys dynamically.

Column Filter Section

This section enables users to map pipeline fields to Cassandra columns and manage data types.

Column Filter Fields

Field
Description

Name

Name of the input column in the pipeline output.

Alias Name

Optional rename for the Cassandra target column.

Column Type

Data type to use when writing to Cassandra (String, Int, UUID, Boolean, etc.).

Users can click Add New Column to define each mapping.

This is particularly useful when:

  • Pipeline outputs contain more columns than needed

  • Columns need renaming

  • Data type normalization is required

Partition Columns

This section is used to define which incoming fields map to the partition key of the Cassandra table.

Partition keys determine:

  • Data distribution across nodes

  • Read/write routing

  • Clustering behavior

circle-info

Note: Incorrect partition mapping can affect performance and data locality.

Execution Behavior

When the pipeline executes:

  1. The Cassandra Writer establishes a session with the target cluster.

  2. For each incoming row or batch:

    • Column mappings are applied

    • UUID values are generated (if configured)

    • Data is cast to the appropriate Cassandra types

  3. Write operations are performed according to:

    • Batch size

    • Consistency level

    • Compression settings

  4. Successful writes return acknowledgments; failures are logged.

If a consistency-level requirement is not met, Cassandra returns an error, and the pipeline marks the component as Failed.

Supported Write Modes

The Cassandra Writer supports:

  • Insert operations

  • Upsert-like writes (default Cassandra behavior when primary key matches)

  • UUID generation for primary key creation

  • Batch execution for higher throughput

Operations not supported:

  • DELETE

  • UPDATE (beyond Cassandra’s natural upsert semantics)

  • ALTER TABLE

Common Use Cases

Real-Time Feature Storage

Store machine learning features used by low-latency inference systems.

Operational Analytics Updating

Write enriched analytics back into Cassandra for application dashboards.

Event Stream Persistence

Persist streaming data (e.g., clickstream, IoT telemetry) into Cassandra for real-time querying.

ETL Pipelines

Load transformed data from Data Lake or relational sources into Cassandra tables.

Best Practices

Connection & Cluster Settings

  • Utilize multiple IP addresses or load-balanced endpoints for production-grade availability.

  • Ensure network/firewall rules allow communication on port 9042.

Batch Size Optimization

  • Start with moderate batch sizes (100–500 rows).

  • Increase cautiously to balance performance vs. memory overhead.

Consistency Level Selection

  • ONE: fastest writes, lowest durability

  • QUORUM: balanced durability and latency

  • ALL: strongest consistency, highest latency

Choose based on application durability needs.

Column Mapping Hygiene

  • Map only required fields.

  • Ensure data types match the Cassandra schema to avoid write-time errors.

UUID Usage

  • Auto-generate UUIDs for surrogate keys in append-only workloads.

Troubleshooting Guide

Issue
Root Cause
Recommended Action

Authentication failure

Wrong username or password

Verify credentials and Cassandra authentication mode.

Write timeout

Batch too large, slow replicas

Reduce batch size or adjust Cassandra timeout settings.

UnavailableException

Not enough replicas for chosen consistency

Lower consistency level (e.g., QUORUM → ONE).

Mapping mismatch

Column type incompatible

Validate schema and adjust column type mapping.

Last updated