Pinot Reader

The Pinot Reader is typically used in scenarios where high-speed analytical queries need to be integrated with batch or streaming workflows running inside the BDB Platform.

The Pinot Reader component enables BDB Data Pipeline users to read data directly from Apache Pinot, a real-time distributed OLAP datastore designed for low-latency analytics. It supports high-performance analytical queries and enables seamless ingestion of real-time and offline datasets for processing, modeling, and operational workflows.

Its flexible configuration options make it suitable for a wide range of advanced analytics, BI, and ML use cases where fast query response times are essential.

Overview

Apache Pinot provides fast OLAP-style queries on real-time and offline datasets. The Pinot Reader component integrates seamlessly with Pinot’s query engine, enabling:

Extraction of real-time analytical datasets
Execution of SQL queries directly within the pipeline
Combining Pinot results with other data sources
Triggering downstream processing based on Pinot metrics

When added to the pipeline, the Pinot Reader acts as the entry (Reader) component for ingesting data from a Pinot cluster.

Component Placement

You can add the Pinot Reader from:

Data Engineering → Data Pipeline → Components → Readers

After dragging the component onto the pipeline canvas, selecting it opens the configuration panel with two tabs:

Basic Information
Meta Information

The image provided reflects the Meta Information tab.

Basic Information Tab

This tab contains general component metadata such as:

Component Name
Description (optional)

These fields help users identify and document the component within complex pipelines.

Meta Information Tab

The Meta Information tab contains all fields required to connect to a Pinot controller and execute queries.

The fields shown in the provided UI include:

Required Fields

Field

Description

Pinot Host*

Hostname or IP address of the Pinot Controller node. Example: pinot-controller.mycompany.com

Pinot Port*

Port number on which the Pinot Controller query API is exposed. Default Pinot port: 9000.

Pinot Table*

The name of the Pinot table to query (e.g., pageviews, sales_offline, clickstream_rt).

Optional Fields

Field

Description

Fetch Size (1000)

Defines the number of records to fetch per batch. Default is 1000. Adjust based on dataset volume and performance considerations.

Query Section

Field

Description

Pinot Query

SQL query that will be executed against the specified Pinot table. Supports Pinot-compatible SQL syntax.

Query Execution Behavior

During pipeline execution:

The Pinot Reader establishes a connection with the configured Pinot Controller.
It executes the user-defined query under Pinot Query.
Results are fetched in batches defined by Fetch Size.
The resulting dataset is passed to downstream components for processing or transformation.

If the query returns no records, the reader outputs an empty dataset.

Note: The Pinot Reader supports SQL queries with or without a filter.

Common Use Cases

Real-Time Analytics Extraction

Execute queries on Pinot’s real-time tables for dashboards, anomaly detection, or event-driven decisions.

Joining Real-Time and Historical Data

Use Pinot data in combination with Data Lake, databases, or streaming sources within the pipeline.

Preprocessing for ML Pipelines

Fetch derived analytics features from Pinot to feed into machine learning training workflows.

Operational Monitoring

Extract metrics for:

Latency measurements
Request patterns
Event counts
Usage statistics

Best Practices

Pinot Host & Port

Always connect to the Pinot Controller, not the Broker or Server node directly.
Ensure network access rules allow secure API communication.

Fetch Size Optimization

Use smaller fetch sizes for latency-sensitive pipelines.
Increase fetch size for large scans to improve throughput.

Efficient Query Design

Prefer SELECT column subsets over SELECT *.
Use time-based predicates to reduce scan volume.
Avoid complex joins (Pinot is optimized for low-latency aggregations and filtering).

Monitoring & Debugging

Validate connectivity via Pinot’s Swagger UI before configuring the pipeline.
Review Pinot logs if the pipeline reports query execution errors.

Last updated 1 month ago

hashtagOverview

hashtagComponent Placement

hashtagBasic Information Tab

hashtagMeta Information Tab

hashtagQuery Execution Behavior

hashtagCommon Use Cases

hashtagReal-Time Analytics Extraction

hashtagJoining Real-Time and Historical Data

hashtagPreprocessing for ML Pipelines

hashtagOperational Monitoring

hashtagBest Practices

hashtagPinot Host & Port

hashtagFetch Size Optimization

hashtagEfficient Query Design

hashtagMonitoring & Debugging

Overview

Component Placement

Basic Information Tab

Meta Information Tab

Query Execution Behavior

Common Use Cases

Real-Time Analytics Extraction

Joining Real-Time and Historical Data

Preprocessing for ML Pipelines

Operational Monitoring

Best Practices

Pinot Host & Port

Fetch Size Optimization

Efficient Query Design

Monitoring & Debugging