Azure Cosmos DB Writer

This Writer is ideal for enterprise scenarios requiring high availability, global distribution, and operational analytics workloads.

The Azure Cosmos DB Writer component enables BDB Data Pipeline users to write pipeline-generated datasets into Azure Cosmos DB, Microsoft’s globally distributed, multi-model NoSQL database. This Writer is ideal for operational workloads that require low-latency reads, elastic scalability, and integration with cloud-native NoSQL data stores.

The Cosmos DB Writer component supports flexible schema mapping, partition key assignment, and selectable write modes to help users reliably persist transformed or enriched data.

Overview

The Azure Cosmos DB Writer provides the ability to:

Write structured or semi-structured data into Cosmos DB containers
Map pipeline fields to Cosmos DB document properties
Define a partition key to optimize performance
Control overwrite or append write behavior through Save Mode
Securely authenticate using an endpoint and master key

This Writer is typically used in pipelines that deliver operational data to cloud applications, API-driven services, IoT event stores, and real-time analytics solutions.

Component Placement

The component can be found under:

Data Engineering → Pipelines → Components → Writers

Selecting the component on the pipeline canvas opens two tabs:

Basic Information
Meta Information

The screenshot provided corresponds to the Meta Information tab.

Basic Information Tab

This tab contains fields such as:

Component Name
Description (optional)

These help users identify and organize components within large pipeline workflows.

Meta Information Tab

The Meta Information tab contains all necessary configuration fields to connect to an Azure Cosmos DB account and define write behavior.

Authentication & Connection Settings

Field

Description

Endpoint*

The Cosmos DB account endpoint URI. Example: https://mycosmosdbaccount.documents.azure.com:443/

MasterKey*

The primary key (or secondary key) used to authenticate requests. Must be stored securely.

Note: The Endpoint and MasterKey pair must correspond to the same Cosmos DB account.

Target Database & Container

Field

Description

Database Name*

The name of the Cosmos DB database where data will be written.

Container Name*

The target container (collection) within the database. Cosmos DB stores JSON documents inside containers.

Partition Key

Optional field specifying the partition key path (e.g., /device_id, /customerId). If omitted, Cosmos DB’s default partitioning configuration applies.

Partition keys strongly influence performance, scalability, and distribution of stored documents.

4.3 Write Behavior Settings

Field

Description

Save Mode

Determines how data is written. Supported modes typically include:

Append: Adds new documents; existing ones remain unchanged.
Overwrite / Upsert: Updates existing documents if the same ID exists; otherwise, inserts a new document.

Note: The exact modes available depend on the platform configuration.

Column Mapping (Selected Columns Section)

The Selected Columns panel allows users to map pipeline output fields to Cosmos DB document fields.

Columns Available for Mapping

Field

Description

Name

The name of the input column from the pipeline dataset.

Alias Name

Optional. Overrides the field name inside the Cosmos DB document.

Column Type

Specifies the data type to store in Cosmos DB (String, Number, Boolean, Object, etc.).

Users can click Add New Column to define additional mappings. This is especially useful when:

The output dataset has more fields than required
Field renaming is needed
Data type normalization must be enforced
Custom document structure is desired

Execution Behavior

During pipeline execution:

The Writer authenticates with Cosmos DB using the Endpoint and MasterKey.
It verifies the existence of the target Database and Container.
Each output record from the pipeline is converted into JSON format.
Column mappings are applied to construct the final Cosmos DB document.
Data is written using the defined Save Mode (Append or Upsert).
Partition key values are evaluated, and documents are routed to the correct logical partitions.

If any write fails (e.g., due to partition mismatch or rate limiting), error details are logged, and the pipeline run reflects the failure.

Supported Write Operations

Insert (Append mode)
Upsert (Overwrite mode, depending on configuration)
Attribute mapping for shaping JSON documents
Partition-based document routing

Unsupported operations:

DELETE
REPLACE document without upsert semantics
Schema-altering operations on databases/containers

Common Use Cases

IoT Data Ingestion

Store device telemetry or sensor data in Cosmos DB for real-time retrieval by applications.

Operational Data Sync

Push enriched customer, order, or product data to cloud applications.

API Backend Data

Persist user session events, metadata, or preference information consumed by microservices.

Event-Driven Architectures

Use Cosmos DB as the operational data store behind event-processing pipelines.

Best Practices

Partition Key Selection

Choose a key with high cardinality to maximize distribution.
Avoid static keys that cause hotspotting.

Performance

Use Append mode for massive ingestion workloads.
Use Upsert only when document updates are needed.

Schema Design

Keep documents small and structured.
Ensure field names are consistent across pipeline stages.

Security

Use read/write keys with the least privilege needed.
Avoid embedding keys in notebooks or scripts; use secure storage.

Throughput Management

Monitor RU consumption inside Cosmos DB.
Scale throughput automatically or manually during heavy writes.

Troubleshooting Guide

Issue

Possible Cause

Recommended Action

Authentication Error

Wrong key or endpoint

Validate endpoint URI and master key.

Rate Limiting (429 errors)

Exceeded RU limits

Increase RU/s or implement retry logic.

Partition Key Mismatch

Missing or incorrect partition value

Verify partition key field exists in mapping.

Document Upsert Failure

Incorrect ID or missing unique key

Ensure each document has a valid ID property.

Last updated 1 month ago

hashtagOverview

hashtagComponent Placement

hashtagBasic Information Tab

hashtagMeta Information Tab

hashtagAuthentication & Connection Settings

hashtagTarget Database & Container

hashtag4.3 Write Behavior Settings

hashtagColumn Mapping (Selected Columns Section)

hashtagColumns Available for Mapping

hashtagExecution Behavior

hashtagSupported Write Operations

hashtagCommon Use Cases

hashtagIoT Data Ingestion

hashtagOperational Data Sync

hashtagAPI Backend Data

hashtagEvent-Driven Architectures

hashtagBest Practices

hashtagPartition Key Selection

hashtagPerformance

hashtagSchema Design

hashtagSecurity

hashtagThroughput Management

hashtagTroubleshooting Guide

Overview

Component Placement

Basic Information Tab

Meta Information Tab

Authentication & Connection Settings

Target Database & Container

4.3 Write Behavior Settings

Column Mapping (Selected Columns Section)

Columns Available for Mapping

Execution Behavior

Supported Write Operations

Common Use Cases

IoT Data Ingestion

Operational Data Sync

API Backend Data

Event-Driven Architectures

Best Practices

Partition Key Selection

Performance

Schema Design

Security

Throughput Management

Troubleshooting Guide