Pinot Writer

The Pinot Writer is ideal for modern data applications requiring ultra-fast ingestion and real-time query responses.

The Pinot Writer component enables users to write processed datasets from a BDB Data Pipeline directly into Apache Pinot, a real-time OLAP datastore designed for ultra-low-latency analytics. This Writer is typically used for updating analytical tables, appending new event data, and publishing derived metrics for real-time dashboards or ad-hoc query workloads.

The component provides configurable connection parameters and write-behavior settings, allowing seamless integration with Pinot's ingestion endpoints.

Overview

Apache Pinot is optimized for:

  • Real-time ingestion

  • High-speed analytical queries

  • Low-latency OLAP workloads

The Pinot Writer component integrates this capability into the BDB platform by enabling pipelines to:

  • Write data into Pinot tables

  • Append new rows or perform upsert-style behavior (depending on table configuration)

  • Route data into Pinot’s ingestion API through the controller

The Pinot Writer serves as the pipeline’s final stage when the objective is to publish analytical results into Pinot.

Component Placement

You can add the Pinot Writer via:

Data Engineering → Pipelines → Components → Writers

Dragging the component onto the canvas and selecting it displays two tabs:

  • Basic Information

  • Meta Information

The provided screenshot corresponds to the Meta Information tab.

Basic Information Tab

This tab contains:

  • Component Name

  • Description (optional)

These fields allow users to logically label and document the Writer within broader pipeline workflows.

Meta Information Tab

The Meta Information tab is used to configure how the Writer connects to Pinot and how data is ingested.

Fields include the following:

Connection Parameters

Field
Description

Pinot Host*

The hostname or IP address of the Pinot Controller, which exposes ingestion APIs. Example: pinot-controller.mycompany.com.

Pinot Port*

The controller’s API port. Pinot commonly uses port 9000 but this may vary by deployment.

Pinot Table*

Target Pinot table for ingestion. This must match a table defined in Pinot’s schema and table configuration.

These fields establish the REST endpoint to which Pinot ingestion requests are sent.

Save Mode

Field
Description

Save Mode

Defines how records are written to Pinot. The default option is:

Append: Adds new rows to the target table.

Overwrite: Updates existing documents if the same ID exists; otherwise, inserts a new document.

circle-info

Notes:

  • Pinot typically uses append-only ingestion; upsert functionality is available only if the table is configured with primary keys and upsert mode.

  • Save Mode reflects the ingestion semantics supported by the underlying table.

Execution Behavior

During pipeline execution:

  1. The Pinot Writer connects to the configured Pinot Controller host and port.

  2. Each row or batch of rows from the pipeline’s upstream component is serialized for ingestion.

  3. The Writer sends ingestion requests to Pinot using its real-time or batch ingestion API.

  4. Pinot processes the request and stores the data based on its table configuration (e.g., append-only, upsert mode).

  5. The pipeline logs ingestion success or failure for monitoring and debugging.

If any parsing or connection error occurs, the Writer marks the pipeline run as Failed and exposes error details in logs.

Supported Write Operations

The Pinot Writer supports:

  • Append Writes

Adds new records to the Pinot table. Default and most widely used mode.

  • Upsert (If Enabled in Pinot)

If the target table is configured with:

  • primaryKeyColumns

  • upsertConfig

Then Pinot will automatically replace older entries with the latest ones based on primary key matching.

✔ Batch-style Writes

Multiple rows can be written in batches for optimal throughput.

✔ Real-time Table Ingestion

Works with tables configured for:

  • Mutable real-time ingestion

  • Kafka-based ingestion pipelines (Pinot merges data later)

Common Use Cases

Real-Time Analytics Publishing

Write streaming or micro-batch processed events into Pinot for instant visibility.

OLAP Metrics Update

Push enriched or aggregated metrics for dashboarding and ad-hoc analytics.

Feature Store Updates

Publish derived features used by machine learning scoring pipelines.

Event Ingestion

Write processed log, clickstream, or IoT data into Pinot tables.

Best Practices

Endpoint Configuration

  • Ensure the host is the Pinot Controller—not a broker or server node.

  • Validate network access to the specified port.

Pinot Table Design

  • Create schemas that match pipeline output columns.

  • Avoid wide rows; Pinot performs best with analytical column structures.

Batch & Throughput

  • Prefer micro-batch ingestion for large datasets.

  • Ensure the pipeline output schema aligns with Pinot schema types (e.g., STRING, LONG, DOUBLE).

Operational Monitoring

  • Monitor ingestion logs inside Pinot to ensure segment creation and indexing.

  • Verify schema mismatch errors early to avoid ingestion failures.

Troubleshooting Guide

Issue
Possible Cause
Recommended Action

Connection refused

Wrong port or host

Verify Pinot Controller endpoint.

Table not found

Incorrect table name

Check Pinot schema and table config.

Schema mismatch

Missing or incompatible columns

Align pipeline schema to Pinot schema.

Ingestion timeout

Large payload or network latency

Reduce batch size or check Pinot resource availability.

Last updated