Azure Cosmos DB Writer
This Writer is ideal for enterprise scenarios requiring high availability, global distribution, and operational analytics workloads.
The Azure Cosmos DB Writer component enables BDB Data Pipeline users to write pipeline-generated datasets into Azure Cosmos DB, Microsoft’s globally distributed, multi-model NoSQL database. This Writer is ideal for operational workloads that require low-latency reads, elastic scalability, and integration with cloud-native NoSQL data stores.
The Cosmos DB Writer component supports flexible schema mapping, partition key assignment, and selectable write modes to help users reliably persist transformed or enriched data.
Overview
The Azure Cosmos DB Writer provides the ability to:
Write structured or semi-structured data into Cosmos DB containers
Map pipeline fields to Cosmos DB document properties
Define a partition key to optimize performance
Control overwrite or append write behavior through Save Mode
Securely authenticate using an endpoint and master key
This Writer is typically used in pipelines that deliver operational data to cloud applications, API-driven services, IoT event stores, and real-time analytics solutions.
Component Placement
The component can be found under:
Data Engineering → Pipelines → Components → Writers
Selecting the component on the pipeline canvas opens two tabs:
Basic Information
Meta Information
The screenshot provided corresponds to the Meta Information tab.
Basic Information Tab
This tab contains fields such as:
Component Name
Description (optional)
These help users identify and organize components within large pipeline workflows.
Meta Information Tab
The Meta Information tab contains all necessary configuration fields to connect to an Azure Cosmos DB account and define write behavior.
Authentication & Connection Settings
Endpoint*
The Cosmos DB account endpoint URI. Example: https://mycosmosdbaccount.documents.azure.com:443/
MasterKey*
The primary key (or secondary key) used to authenticate requests. Must be stored securely.
Target Database & Container
Database Name*
The name of the Cosmos DB database where data will be written.
Container Name*
The target container (collection) within the database. Cosmos DB stores JSON documents inside containers.
Partition Key
Optional field specifying the partition key path (e.g., /device_id, /customerId). If omitted, Cosmos DB’s default partitioning configuration applies.
Partition keys strongly influence performance, scalability, and distribution of stored documents.
4.3 Write Behavior Settings
Save Mode
Determines how data is written. Supported modes typically include:
Append: Adds new documents; existing ones remain unchanged.
Overwrite / Upsert: Updates existing documents if the same ID exists; otherwise, inserts a new document.
Column Mapping (Selected Columns Section)
The Selected Columns panel allows users to map pipeline output fields to Cosmos DB document fields.
Columns Available for Mapping
Name
The name of the input column from the pipeline dataset.
Alias Name
Optional. Overrides the field name inside the Cosmos DB document.
Column Type
Specifies the data type to store in Cosmos DB (String, Number, Boolean, Object, etc.).
Users can click Add New Column to define additional mappings. This is especially useful when:
The output dataset has more fields than required
Field renaming is needed
Data type normalization must be enforced
Custom document structure is desired
Execution Behavior
During pipeline execution:
The Writer authenticates with Cosmos DB using the Endpoint and MasterKey.
It verifies the existence of the target Database and Container.
Each output record from the pipeline is converted into JSON format.
Column mappings are applied to construct the final Cosmos DB document.
Data is written using the defined Save Mode (Append or Upsert).
Partition key values are evaluated, and documents are routed to the correct logical partitions.
If any write fails (e.g., due to partition mismatch or rate limiting), error details are logged, and the pipeline run reflects the failure.
Supported Write Operations
Insert (Append mode)
Upsert (Overwrite mode, depending on configuration)
Attribute mapping for shaping JSON documents
Partition-based document routing
Unsupported operations:
DELETE
REPLACE document without upsert semantics
Schema-altering operations on databases/containers
Common Use Cases
IoT Data Ingestion
Store device telemetry or sensor data in Cosmos DB for real-time retrieval by applications.
Operational Data Sync
Push enriched customer, order, or product data to cloud applications.
API Backend Data
Persist user session events, metadata, or preference information consumed by microservices.
Event-Driven Architectures
Use Cosmos DB as the operational data store behind event-processing pipelines.
Best Practices
Partition Key Selection
Choose a key with high cardinality to maximize distribution.
Avoid static keys that cause hotspotting.
Performance
Use Append mode for massive ingestion workloads.
Use Upsert only when document updates are needed.
Schema Design
Keep documents small and structured.
Ensure field names are consistent across pipeline stages.
Security
Use read/write keys with the least privilege needed.
Avoid embedding keys in notebooks or scripts; use secure storage.
Throughput Management
Monitor RU consumption inside Cosmos DB.
Scale throughput automatically or manually during heavy writes.
Troubleshooting Guide
Authentication Error
Wrong key or endpoint
Validate endpoint URI and master key.
Rate Limiting (429 errors)
Exceeded RU limits
Increase RU/s or implement retry logic.
Partition Key Mismatch
Missing or incorrect partition value
Verify partition key field exists in mapping.
Document Upsert Failure
Incorrect ID or missing unique key
Ensure each document has a valid ID property.
Last updated