Pinot Writer
The Pinot Writer is ideal for modern data applications requiring ultra-fast ingestion and real-time query responses.
The Pinot Writer component enables users to write processed datasets from a BDB Data Pipeline directly into Apache Pinot, a real-time OLAP datastore designed for ultra-low-latency analytics. This Writer is typically used for updating analytical tables, appending new event data, and publishing derived metrics for real-time dashboards or ad-hoc query workloads.
The component provides configurable connection parameters and write-behavior settings, allowing seamless integration with Pinot's ingestion endpoints.
Overview
Apache Pinot is optimized for:
Real-time ingestion
High-speed analytical queries
Low-latency OLAP workloads
The Pinot Writer component integrates this capability into the BDB platform by enabling pipelines to:
Write data into Pinot tables
Append new rows or perform upsert-style behavior (depending on table configuration)
Route data into Pinot’s ingestion API through the controller
The Pinot Writer serves as the pipeline’s final stage when the objective is to publish analytical results into Pinot.
Component Placement
You can add the Pinot Writer via:
Data Engineering → Pipelines → Components → Writers
Dragging the component onto the canvas and selecting it displays two tabs:
Basic Information
Meta Information
The provided screenshot corresponds to the Meta Information tab.
Basic Information Tab
This tab contains:
Component Name
Description (optional)
These fields allow users to logically label and document the Writer within broader pipeline workflows.
Meta Information Tab
The Meta Information tab is used to configure how the Writer connects to Pinot and how data is ingested.
Fields include the following:
Connection Parameters
Pinot Host*
The hostname or IP address of the Pinot Controller, which exposes ingestion APIs. Example: pinot-controller.mycompany.com.
Pinot Port*
The controller’s API port. Pinot commonly uses port 9000 but this may vary by deployment.
Pinot Table*
Target Pinot table for ingestion. This must match a table defined in Pinot’s schema and table configuration.
These fields establish the REST endpoint to which Pinot ingestion requests are sent.
Save Mode
Save Mode
Defines how records are written to Pinot. The default option is:
Append: Adds new rows to the target table.
Overwrite: Updates existing documents if the same ID exists; otherwise, inserts a new document.
Execution Behavior
During pipeline execution:
The Pinot Writer connects to the configured Pinot Controller host and port.
Each row or batch of rows from the pipeline’s upstream component is serialized for ingestion.
The Writer sends ingestion requests to Pinot using its real-time or batch ingestion API.
Pinot processes the request and stores the data based on its table configuration (e.g., append-only, upsert mode).
The pipeline logs ingestion success or failure for monitoring and debugging.
If any parsing or connection error occurs, the Writer marks the pipeline run as Failed and exposes error details in logs.
Supported Write Operations
The Pinot Writer supports:
Append Writes
Adds new records to the Pinot table. Default and most widely used mode.
Upsert (If Enabled in Pinot)
If the target table is configured with:
primaryKeyColumnsupsertConfig
Then Pinot will automatically replace older entries with the latest ones based on primary key matching.
✔ Batch-style Writes
Multiple rows can be written in batches for optimal throughput.
✔ Real-time Table Ingestion
Works with tables configured for:
Mutable real-time ingestion
Kafka-based ingestion pipelines (Pinot merges data later)
Common Use Cases
Real-Time Analytics Publishing
Write streaming or micro-batch processed events into Pinot for instant visibility.
OLAP Metrics Update
Push enriched or aggregated metrics for dashboarding and ad-hoc analytics.
Feature Store Updates
Publish derived features used by machine learning scoring pipelines.
Event Ingestion
Write processed log, clickstream, or IoT data into Pinot tables.
Best Practices
Endpoint Configuration
Ensure the host is the Pinot Controller—not a broker or server node.
Validate network access to the specified port.
Pinot Table Design
Create schemas that match pipeline output columns.
Avoid wide rows; Pinot performs best with analytical column structures.
Batch & Throughput
Prefer micro-batch ingestion for large datasets.
Ensure the pipeline output schema aligns with Pinot schema types (e.g., STRING, LONG, DOUBLE).
Operational Monitoring
Monitor ingestion logs inside Pinot to ensure segment creation and indexing.
Verify schema mismatch errors early to avoid ingestion failures.
Troubleshooting Guide
Connection refused
Wrong port or host
Verify Pinot Controller endpoint.
Table not found
Incorrect table name
Check Pinot schema and table config.
Schema mismatch
Missing or incompatible columns
Align pipeline schema to Pinot schema.
Ingestion timeout
Large payload or network latency
Reduce batch size or check Pinot resource availability.
Last updated