# Offset Mapping

Offset Mapping is a sophisticated Pipeline feature designed to maintain data continuity during component modification or replacement. This functionality allows a newly integrated or substitute component to inherit the precise state and processing markers of its predecessor, ensuring seamless transition within the data stream.

Rather than initializing processing from the stream's origin, the successor component resumes operations from the exact **offset position** where the previous component ceased activity. This mechanism is vital for maintaining the integrity of production pipelines—particularly those consuming from event-driven architectures such as **Kafka topics** or **MongoDB** change streams.

By synchronizing these states, Offset Mapping effectively eliminates the risks of:

* **Data Redundancy**: Preventing the reprocessing of previously handled records.
* **Data Gaps:** Ensuring no in-flight information is lost during pipeline evolution or maintenance.

### Key Technical Advantages

<table data-header-hidden><thead><tr><th width="175.60003662109375"></th><th></th></tr></thead><tbody><tr><td><strong>Feature</strong></td><td><strong>Description</strong></td></tr><tr><td><strong>State Inheritance</strong></td><td>Automatically transfers the offset state from an inactive or replaced component to a new one.</td></tr><tr><td><strong>Stream Integrity</strong></td><td>Guarantees "exactly-once" processing semantics by avoiding full-stream restarts.</td></tr><tr><td><strong>Pipeline Scalability</strong></td><td>Enables the evolution of pipeline logic without disrupting the flow of real-time data.</td></tr></tbody></table>

### Key Capabilities

<table><thead><tr><th width="232.39996337890625">Capability</th><th>Description</th></tr></thead><tbody><tr><td><strong>State-Preserving Replacement</strong></td><td>Swap out an existing component (e.g., SQL Component, Python Script) with a new instance while preserving the consumer offset of the original.</td></tr><tr><td><strong>Zero Data Loss</strong></td><td>The new component begins consumption from the last committed offset of the replaced component, guaranteeing no records are missed.</td></tr><tr><td><strong>Zero Data Duplication</strong></td><td>Prevents reprocessing of records already handled by the previous component — essential for idempotency in downstream sinks.</td></tr><tr><td><strong>Pipeline Continuity</strong></td><td>Enables in-place upgrades, logic changes, or configuration adjustments without re-running the entire pipeline from origin.</td></tr><tr><td><strong>Timestamp-Based Offset Reference</strong></td><td>Offset positions are tracked using ISO 8601 timestamps (e.g., <code>2026-04-09T05:18:20.476592378Z[GMT]</code>), making them human-readable and auditable.</td></tr></tbody></table>

### When to Use Offset Mapping

Use Offset Mapping in any scenario where a pipeline component must be **replaced, upgraded, or reconfigured without losing its processing checkpoint**. Typical use cases include:

* Modifying a SQL transformation query without reprocessing the full source stream
* Replacing a Python Script component with a new logic version
* Recovering a failed component by substituting it with a freshly configured one
* Splitting the responsibilities of a single component across multiple new components while preserving continuity

For high-throughput environments, Offset Mapping serves as a critical safeguard. Without this capability, replacing a component would necessitate a manual reset, leading to significant overhead and potential inconsistencies in downstream analytics. Offset Mapping automates this transition, providing a robust framework for persistent data synchronization.

{% hint style="warning" %}

#### Prerequisites:

Before configuring Offset Mapping, ensure the following:

* The pipeline contains at least one component whose offset state you wish to inherit.
* The original component has been **stopped** (confirmed by a toast such as "*Sandbox Writer\_1 stopped successfully*" or the equivalent for the target component).
* You have edit permissions on the pipeline.
* The new component's schema and configuration are compatible with the data contract expected by downstream events.
  {% endhint %}

### Configuring Offset Mapping&#x20;

{% stepper %}
{% step %}

#### **Stop the Existing Pipeline Component**

* Navigate to a pipeline workflow.
* Stop the running component whose state will be inherited.&#x20;
* A confirmation message appears at the top of the canvas (e.g., *"SQL Component\_1 stopped successfully"*), indicating the component has been safely halted.
  {% endstep %}

{% step %}

#### **Add the Replacement Component**

* Open the **Components** panel on the right side of the Pipeline Editor.
* Search for the required component (e.g., *SQL Component*) under the **Transformations** category.
* **Drag and drop** the component onto the canvas.
* The naming convention follows a sequential logic based on the component's order of appearance within the pipeline workflow. For instance, a component labeled 'SQL Component\_2' signifies that it is the second instance of a SQL processing element integrated into the current workflow.
  {% endstep %}

{% step %}

#### **Configure Basic Information**

In the **Basic Information** tab of the new component, configure:

* **Invocation Type** *(Mandatory):* Select the invocation mode from the drop-down.
* **Deployment Type:** Displays the deployment mode (e.g., `spark`).
* **Batch Size (min 1):** Records per batch; minimum value of **1**.
* **Failover Event:** Event to which records are routed on failure. Defaults to **None**.
* **Container Image Version:** Auto-populated container image version (e.g., `9.5.2404`).
* **Partitioning Factor:** Number of partitions used for parallel processing.
* **Description:** Optional free-text description.
  {% endstep %}

{% step %}

#### **Configure Meta Information**

In the **Meta Information** tab:

* **Query Type** *(Mandatory):* Select the query execution mode (e.g., **Batch Query**).
* **Schema File Name:** Upload the schema definition, if required.
* **Table Name** *(Mandatory):* Logical table name referenced within the query (e.g., `df`).
* **Query** *(Mandatory):* The SQL statement to execute (e.g., `select * from df`).
* **Selected Columns:** Optionally define specific output columns — *Name*, *Alias Name*, *Column Type*. Use **Add New Column** to add entries.
  {% endstep %}

{% step %}

#### **Save the Component**

* Click the **Save Component In Storage** (<img src="/files/8l5XRElsRmbeMF7HO6tx" alt="" data-size="line">) icon at the top-right of the configuration panel.&#x20;
* A confirmation toast — **"Component properties saved."** — confirms the component has been persisted.

  <figure><img src="/files/g43lFIcyBOW7CEef24Rc" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %}

#### **Connect the Component in the Pipeline**

* Connect the input and output ports of the new component to the appropriate upstream and downstream events, completing the data flow path.
* Save the pipeline to trigger the **"Pipeline updation success"** toast.

  <figure><img src="/files/eNHzU1zuoTCN4swqHMAG" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %}

#### **Remove the Obsolete Component (Optional)**

* If the old component is being fully replaced, right-click the old component and select **Delete** from the context menu.&#x20;

  <figure><img src="/files/vJQxKVzlK74Xmtg32D7r" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %}

#### **Open the Configuration Tab of the New Component**

* Click the new component and navigate to the **Configuration** tab.&#x20;
* The Configuration tab exposes execution parameters and — critically — the **Offset Mapping** field.
  {% endstep %}

{% step %}

#### **Apply the Offset Mapping**

* Locate the **Offset Mapping** drop-down in the Configuration tab.

* Open the drop-down. It displays available offsets from previously executed components in the pipeline, shown as **timestamp entries** with the source component name beneath. For example:

  ```
     Offset Mapping: 2026-04-09T05:18:20.476592378Z[GMT]
                     SQL Component_1-yqZV
  ```

* Select the offset corresponding to the component whose state you wish to inherit (typically the component being replaced).

  <figure><img src="/files/SJHoBdo1VF0KTxhNd3Ah" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %}

#### **Configure Driver and Executor Resources**

In the same **Configuration** tab, review and adjust the runtime resources:

**Driver**

<table><thead><tr><th width="272.39996337890625">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>Core (min 0.1)</strong> <em>(Mandatory)</em></td><td>CPU cores allocated to the driver process (e.g., <code>0.5</code>).</td></tr><tr><td><strong>Memory (min 250)</strong> <em>(Mandatory)</em></td><td>Memory in MB allocated to the driver (e.g., <code>1024</code>).</td></tr></tbody></table>

**Executor**

<table><thead><tr><th width="307.60003662109375">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>Core (min 1)</strong> <em>(Mandatory)</em></td><td>CPU cores per executor (e.g., <code>1</code>).</td></tr><tr><td><strong>Memory (min 250)</strong> <em>(Mandatory)</em></td><td>Memory in MB per executor (e.g., <code>1024</code>).</td></tr><tr><td><strong>Instances (min 1, max 1)</strong> <em>(Mandatory)</em></td><td>Number of executor instances (e.g., <code>1</code>).</td></tr><tr><td><strong>Max Instances (min 1)</strong> <em>(Mandatory)</em></td><td>Maximum executor instances for autoscaling (e.g., <code>1</code>).</td></tr></tbody></table>

* [x] Additional fields:

- **Node Pool** — Select the target node pool for execution.
- **Intelligent Scaling** — Enable the checkbox to allow the platform to auto-tune resource allocation.
  {% endstep %}

{% step %}

#### **Save the Pipeline**

* Click the **Update Pipeline** (<img src="/files/dXcrishchXuS5o8MktTI" alt="" data-size="line">) icon in the top-right toolbar.&#x20;
* A confirmation toast — **"Pipeline updation success."** — confirms that the changes have been persisted.

{% hint style="info" %}
**Note:** If prompted by a **"Reload site? Changes you made may not be saved."** dialog, click **Reload** only after confirming your changes have already been saved; otherwise, click **Cancel** to save first.
{% endhint %}
{% endstep %}

{% step %}

#### **Activate the Pipeline**

* Click the **Activate Pipeline** (▶) icon in the toolbar.
* A **Confirm** dialog appears: *"Do you want to activate the pipeline?"*
* Click **Yes** to activate. The pipeline header indicator turns **green**, confirming the active state.

  <figure><img src="/files/4HTiGIEqpYA65LoYHh7t" alt=""><figcaption></figcaption></figure>

{% endstep %}

{% step %}

#### **Verify Execution via Logs**

* Click the **Log** icon in the top-right toolbar.
* The Log panel opens with two tabs:
  * **Logs** — Displays SUCCESS and ERROR events in chronological order, including messages such as *"SQL Component\_1 started processing"*, *"SQL Component\_1 sending data to Event"*, and *"mongodb-reader-lite-cp-1-...successfully sent data"*.
  * **Component Status** — Lists live component processes with timestamps (e.g., *mongodb-reader-lite-cp-1-0*, *sandbox-writer-1-Driver*, *sql-component-2-Driver*).
* A message appears to confirm that the log view is ready.

  <figure><img src="/files/xte7Csc6z0IUfSGUqXlD" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
**Note:** Use the **Kill Orphan Processes** button to terminate any dangling processes if needed, and the **Open All** drop-down to expand all log groups.
{% endhint %}
{% endstep %}

{% step %}

#### **Preview Output Data**

To validate that the new component is producing correct results from the inherited offset:

* Click the downstream event (e.g., `offset_mapping_flow_event_1`).
* Navigate to the **Preview** tab.
* The **Data Preview** table displays the most recent records along with column types (e.g., `id`, )

  <figure><img src="/files/8KqNvuju2dpHYFIRpayD" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
**Hint:** Use the **Filter Type** drop-down (e.g., *Latest*) to adjust the preview window, and use **Download**, **Copy**, or **Refresh** as required.
{% endhint %}
{% endstep %}
{% endstepper %}

### Best Practices

* **Always stop the source component before replacing it.** Performing Offset Mapping against a running component can result in offset drift.
* **Record the offset timestamp** of the previous component before deletion, in case you need to audit or roll back the change.
* **Verify downstream schemas match** between the old and new components. Offset Mapping only preserves consumer position — it does not reconcile schema differences.
* **Test in a non-production workspace first.** Use a development pipeline to validate that the new component correctly resumes from the mapped offset.
* **Monitor logs immediately after activation.** Watch for the *"started processing"* and *"sending data to Event"* messages to confirm the component is live and progressing from the mapped offset.
* **Do not reuse the same offset across multiple new components** unless the data flow requires a deliberate fan-out from that position.

### Troubleshooting

<table><thead><tr><th width="240.2000732421875">Issue</th><th>Possible Cause</th><th>Resolution</th></tr></thead><tbody><tr><td>Offset Mapping drop-down is empty</td><td>No prior component has committed offsets in this pipeline.</td><td>Run the original component at least once before replacement.</td></tr><tr><td>New component reprocesses old data</td><td>Offset Mapping was left blank, causing the component to start from the earliest available offset.</td><td>Open the Configuration tab and explicitly select the correct offset entry.</td></tr><tr><td>"Pipeline updation success" toast not appearing</td><td>Mandatory fields in Configuration or Meta Information are incomplete.</td><td>Verify all fields marked with <code>*</code> are populated and save again.</td></tr><tr><td>Component status shows orphaned processes</td><td>Previous driver/executor pods did not terminate cleanly.</td><td>Use the <strong>Kill Orphan Processes</strong> button in the Component Status tab.</td></tr><tr><td>Data Preview shows no records</td><td>Pipeline has not yet processed data post-activation, or the offset points to the end of the stream.</td><td>Wait for new records to arrive, or refresh the preview using the <strong>Refresh</strong> button.</td></tr></tbody></table>

Offset Mapping transforms component replacement from a disruptive operation into a seamless, state-preserving update. By binding a new component to the offset timestamp of its predecessor, the platform guarantees **exactly-once continuity** across pipeline revisions — a cornerstone of reliable, evolvable streaming data architectures.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/bdb-user-documentation/platform-modules/11.0/data-engineering/data-pipelines/pipeline-editor/offset-mapping.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
