Azure Cosmos DB Writer

This Writer is ideal for enterprise scenarios requiring high availability, global distribution, and operational analytics workloads.

The Azure Cosmos DB Writer component enables BDB Data Pipeline users to write pipeline-generated datasets into Azure Cosmos DB, Microsoft’s globally distributed, multi-model NoSQL database. This Writer is ideal for operational workloads that require low-latency reads, elastic scalability, and integration with cloud-native NoSQL data stores.

The Cosmos DB Writer component supports flexible schema mapping, partition key assignment, and selectable write modes to help users reliably persist transformed or enriched data.

Overview

The Azure Cosmos DB Writer provides the ability to:

  • Write structured or semi-structured data into Cosmos DB containers

  • Map pipeline fields to Cosmos DB document properties

  • Define a partition key to optimize performance

  • Control overwrite or append write behavior through Save Mode

  • Securely authenticate using an endpoint and master key

This Writer is typically used in pipelines that deliver operational data to cloud applications, API-driven services, IoT event stores, and real-time analytics solutions.

Component Placement

The component can be found under:

Data Engineering → Pipelines → Components → Writers

Selecting the component on the pipeline canvas opens two tabs:

  • Basic Information

  • Meta Information

The screenshot provided corresponds to the Meta Information tab.

Basic Information Tab

This tab contains fields such as:

  • Component Name

  • Description (optional)

These help users identify and organize components within large pipeline workflows.

Meta Information Tab

The Meta Information tab contains all necessary configuration fields to connect to an Azure Cosmos DB account and define write behavior.

Authentication & Connection Settings

Field
Description

Endpoint*

The Cosmos DB account endpoint URI. Example: https://mycosmosdbaccount.documents.azure.com:443/

MasterKey*

The primary key (or secondary key) used to authenticate requests. Must be stored securely.

circle-info

Note: The Endpoint and MasterKey pair must correspond to the same Cosmos DB account.

Target Database & Container

Field
Description

Database Name*

The name of the Cosmos DB database where data will be written.

Container Name*

The target container (collection) within the database. Cosmos DB stores JSON documents inside containers.

Partition Key

Optional field specifying the partition key path (e.g., /device_id, /customerId). If omitted, Cosmos DB’s default partitioning configuration applies.

Partition keys strongly influence performance, scalability, and distribution of stored documents.

4.3 Write Behavior Settings

Field
Description

Save Mode

Determines how data is written. Supported modes typically include:

  • Append: Adds new documents; existing ones remain unchanged.

  • Overwrite / Upsert: Updates existing documents if the same ID exists; otherwise, inserts a new document.

circle-info

Note: The exact modes available depend on the platform configuration.

Column Mapping (Selected Columns Section)

The Selected Columns panel allows users to map pipeline output fields to Cosmos DB document fields.

Columns Available for Mapping

Field
Description

Name

The name of the input column from the pipeline dataset.

Alias Name

Optional. Overrides the field name inside the Cosmos DB document.

Column Type

Specifies the data type to store in Cosmos DB (String, Number, Boolean, Object, etc.).

Users can click Add New Column to define additional mappings. This is especially useful when:

  • The output dataset has more fields than required

  • Field renaming is needed

  • Data type normalization must be enforced

  • Custom document structure is desired

Execution Behavior

During pipeline execution:

  • The Writer authenticates with Cosmos DB using the Endpoint and MasterKey.

  • It verifies the existence of the target Database and Container.

  • Each output record from the pipeline is converted into JSON format.

  • Column mappings are applied to construct the final Cosmos DB document.

  • Data is written using the defined Save Mode (Append or Upsert).

  • Partition key values are evaluated, and documents are routed to the correct logical partitions.

If any write fails (e.g., due to partition mismatch or rate limiting), error details are logged, and the pipeline run reflects the failure.

Supported Write Operations

  • Insert (Append mode)

  • Upsert (Overwrite mode, depending on configuration)

  • Attribute mapping for shaping JSON documents

  • Partition-based document routing

Unsupported operations:

  • DELETE

  • REPLACE document without upsert semantics

  • Schema-altering operations on databases/containers

Common Use Cases

IoT Data Ingestion

Store device telemetry or sensor data in Cosmos DB for real-time retrieval by applications.

Operational Data Sync

Push enriched customer, order, or product data to cloud applications.

API Backend Data

Persist user session events, metadata, or preference information consumed by microservices.

Event-Driven Architectures

Use Cosmos DB as the operational data store behind event-processing pipelines.

Best Practices

Partition Key Selection

  • Choose a key with high cardinality to maximize distribution.

  • Avoid static keys that cause hotspotting.

Performance

  • Use Append mode for massive ingestion workloads.

  • Use Upsert only when document updates are needed.

Schema Design

  • Keep documents small and structured.

  • Ensure field names are consistent across pipeline stages.

Security

  • Use read/write keys with the least privilege needed.

  • Avoid embedding keys in notebooks or scripts; use secure storage.

Throughput Management

  • Monitor RU consumption inside Cosmos DB.

  • Scale throughput automatically or manually during heavy writes.

Troubleshooting Guide

Issue
Possible Cause
Recommended Action

Authentication Error

Wrong key or endpoint

Validate endpoint URI and master key.

Rate Limiting (429 errors)

Exceeded RU limits

Increase RU/s or implement retry logic.

Partition Key Mismatch

Missing or incorrect partition value

Verify partition key field exists in mapping.

Document Upsert Failure

Incorrect ID or missing unique key

Ensure each document has a valid ID property.

Last updated