Cassandra Writer
The Cassandra Writer provides configurable connection options, batch settings, column mapping, and consistency controls to ensure reliable write operations into the target Cassandra table.
The Cassandra Writer is a high-performance, flexible component for writing data from BDB Data Pipelines into Apache Cassandra. Its support for batch optimization, column filtering, consistency control, and UUID generation makes it suitable for enterprise-scale real-time pipelines and operational analytics workloads.
This component is essential for organizations that rely on Cassandra as a backend for scalable, fault-tolerant data applications.
Overview
The Cassandra Writer component:
Writes pipeline output rows into a Cassandra table
Supports configurable consistency levels
Allows custom column mapping and type conversion
Enables batch-optimized writes for performance
Supports UUID generation for primary-key creation
Allows partitioning and filtering configurations
It is designed for production workloads that require high-throughput ingestion into Cassandra clusters.
Component Placement
The component is available under:
Data Engineering → Pipelines → Components → Writers
When placed on the pipeline canvas, selecting the node displays two tabs:
Basic Information
Meta Information
The screenshot provided corresponds to the Meta Information configuration.
Basic Information Tab
This tab contains:
Component Name
Description (optional)
These fields help with identifying and documenting the writer in multi-step workflows.
Meta Information Tab
The Meta Information tab contains all configuration fields required to connect to the Cassandra cluster and define how the writer should behave.
Connection Parameters
Host IP Address*
IP address or hostname of a Cassandra cluster contact point.
Port*
Port for Cassandra's native protocol. Default is 9042.
Keyspace*
The keyspace in which the target table resides.
Table*
The Cassandra table into which data will be written.
Username*
Username for Cassandra authentication.
Password*
Password for the provided username.
Cluster
Optional descriptive name for the Cassandra cluster.
Compression Method
Compression algorithm used during communication (e.g., Snappy, LZ4).
Write Behavior Settings
Consistency
Determines the required number of replica acknowledgements (e.g., ONE, QUORUM, ALL). Impacts durability and performance.
No. of Rows Per Batch
Defines the number of rows written per batch request. Larger batches increase throughput but require more memory.
UUID Column Name
Specifies a column where auto-generated UUID values should be inserted. Useful when generating primary keys dynamically.
Column Filter Section
This section enables users to map pipeline fields to Cassandra columns and manage data types.
Column Filter Fields
Name
Name of the input column in the pipeline output.
Alias Name
Optional rename for the Cassandra target column.
Column Type
Data type to use when writing to Cassandra (String, Int, UUID, Boolean, etc.).
Users can click Add New Column to define each mapping.
This is particularly useful when:
Pipeline outputs contain more columns than needed
Columns need renaming
Data type normalization is required
Partition Columns
This section is used to define which incoming fields map to the partition key of the Cassandra table.
Partition keys determine:
Data distribution across nodes
Read/write routing
Clustering behavior
Execution Behavior
When the pipeline executes:
The Cassandra Writer establishes a session with the target cluster.
For each incoming row or batch:
Column mappings are applied
UUID values are generated (if configured)
Data is cast to the appropriate Cassandra types
Write operations are performed according to:
Batch size
Consistency level
Compression settings
Successful writes return acknowledgments; failures are logged.
If a consistency-level requirement is not met, Cassandra returns an error, and the pipeline marks the component as Failed.
Supported Write Modes
The Cassandra Writer supports:
Insert operations
Upsert-like writes (default Cassandra behavior when primary key matches)
UUID generation for primary key creation
Batch execution for higher throughput
Operations not supported:
DELETE
UPDATE (beyond Cassandra’s natural upsert semantics)
ALTER TABLE
Common Use Cases
Real-Time Feature Storage
Store machine learning features used by low-latency inference systems.
Operational Analytics Updating
Write enriched analytics back into Cassandra for application dashboards.
Event Stream Persistence
Persist streaming data (e.g., clickstream, IoT telemetry) into Cassandra for real-time querying.
ETL Pipelines
Load transformed data from Data Lake or relational sources into Cassandra tables.
Best Practices
Connection & Cluster Settings
Utilize multiple IP addresses or load-balanced endpoints for production-grade availability.
Ensure network/firewall rules allow communication on port 9042.
Batch Size Optimization
Start with moderate batch sizes (100–500 rows).
Increase cautiously to balance performance vs. memory overhead.
Consistency Level Selection
ONE: fastest writes, lowest durability
QUORUM: balanced durability and latency
ALL: strongest consistency, highest latency
Choose based on application durability needs.
Column Mapping Hygiene
Map only required fields.
Ensure data types match the Cassandra schema to avoid write-time errors.
UUID Usage
Auto-generate UUIDs for surrogate keys in append-only workloads.
Troubleshooting Guide
Authentication failure
Wrong username or password
Verify credentials and Cassandra authentication mode.
Write timeout
Batch too large, slow replicas
Reduce batch size or adjust Cassandra timeout settings.
UnavailableException
Not enough replicas for chosen consistency
Lower consistency level (e.g., QUORUM → ONE).
Mapping mismatch
Column type incompatible
Validate schema and adjust column type mapping.
Last updated