Pinot Reader
The Pinot Reader is typically used in scenarios where high-speed analytical queries need to be integrated with batch or streaming workflows running inside the BDB Platform.
The Pinot Reader component enables BDB Data Pipeline users to read data directly from Apache Pinot, a real-time distributed OLAP datastore designed for low-latency analytics. It supports high-performance analytical queries and enables seamless ingestion of real-time and offline datasets for processing, modeling, and operational workflows.
Its flexible configuration options make it suitable for a wide range of advanced analytics, BI, and ML use cases where fast query response times are essential.
Overview
Apache Pinot provides fast OLAP-style queries on real-time and offline datasets. The Pinot Reader component integrates seamlessly with Pinot’s query engine, enabling:
Extraction of real-time analytical datasets
Execution of SQL queries directly within the pipeline
Combining Pinot results with other data sources
Triggering downstream processing based on Pinot metrics
When added to the pipeline, the Pinot Reader acts as the entry (Reader) component for ingesting data from a Pinot cluster.
Component Placement
You can add the Pinot Reader from:
Data Engineering → Data Pipeline → Components → Readers
After dragging the component onto the pipeline canvas, selecting it opens the configuration panel with two tabs:
Basic Information
Meta Information
The image provided reflects the Meta Information tab.
Basic Information Tab
This tab contains general component metadata such as:
Component Name
Description (optional)
These fields help users identify and document the component within complex pipelines.
Meta Information Tab
The Meta Information tab contains all fields required to connect to a Pinot controller and execute queries.
The fields shown in the provided UI include:
Required Fields
Pinot Host*
Hostname or IP address of the Pinot Controller node. Example: pinot-controller.mycompany.com
Pinot Port*
Port number on which the Pinot Controller query API is exposed. Default Pinot port: 9000.
Pinot Table*
The name of the Pinot table to query (e.g., pageviews, sales_offline, clickstream_rt).
Optional Fields
Fetch Size (1000)
Defines the number of records to fetch per batch. Default is 1000. Adjust based on dataset volume and performance considerations.
Query Section
Pinot Query
SQL query that will be executed against the specified Pinot table. Supports Pinot-compatible SQL syntax.
Query Execution Behavior
During pipeline execution:
The Pinot Reader establishes a connection with the configured Pinot Controller.
It executes the user-defined query under Pinot Query.
Results are fetched in batches defined by Fetch Size.
The resulting dataset is passed to downstream components for processing or transformation.
If the query returns no records, the reader outputs an empty dataset.
Common Use Cases
Real-Time Analytics Extraction
Execute queries on Pinot’s real-time tables for dashboards, anomaly detection, or event-driven decisions.
Joining Real-Time and Historical Data
Use Pinot data in combination with Data Lake, databases, or streaming sources within the pipeline.
Preprocessing for ML Pipelines
Fetch derived analytics features from Pinot to feed into machine learning training workflows.
Operational Monitoring
Extract metrics for:
Latency measurements
Request patterns
Event counts
Usage statistics
Best Practices
Pinot Host & Port
Always connect to the Pinot Controller, not the Broker or Server node directly.
Ensure network access rules allow secure API communication.
Fetch Size Optimization
Use smaller fetch sizes for latency-sensitive pipelines.
Increase fetch size for large scans to improve throughput.
Efficient Query Design
Prefer SELECT column subsets over
SELECT *.Use time-based predicates to reduce scan volume.
Avoid complex joins (Pinot is optimized for low-latency aggregations and filtering).
Monitoring & Debugging
Validate connectivity via Pinot’s Swagger UI before configuring the pipeline.
Review Pinot logs if the pipeline reports query execution errors.
Last updated