Pipeline Monitoring

The Pipeline Monitoring interface provides real-time visibility into the operational health, performance metrics, and execution status of active pipelines. This view enables data engineers and operations teams to proactively monitor resource consumption, processing statistics, and system responsiveness, ensuring the stability and reliability of data workflows.

Accessing the Monitoring Interface

  • Navigate to the Pipelines list page under the Data Engineering module.

  • Open the right side panel with related options for the desired active pipeline from the Pipelines List.

  • Click the Pipeline Monitoring icon from the right side panel of the screen.

Monitor Tab

The Pipeline monitoring tab interface is divided into two primary sections:

Execution Summary Table (Main Panel)

This section presents execution metrics at the pipeline component level (e.g., readers, processors).

Field

Description

Name

Name of the pipeline stage/component (e.g., Sandbox Reader _1)

Status

Current health indicator of the component: - UP (green): Running successfully - OFF (gray): Inactive/stopped

Type

Indicates the processing mode: - realtime for streaming pipelines

Instances

Number of parallel instances or replicas running

Last Processed Time

Timestamp of the last successfully processed record

Last Processed Size

Size (in MB) of the most recently processed batch

Last Processed Count

Number of records processed in the most recent interval

Total Number of Records

Cumulative number of records processed by the component

CPU Utilization

Real-time CPU consumption shown as: Used Cores / Allocated Cores

Memory Utilization

Real-time memory usage displayed as: Used MB / Allocated MB

Pipeline Metadata Summary (Sidebar Panel)

This sidebar presents a quick snapshot of pipeline-level operational metadata. This panel remains constant for all the monitoring tabs.

Field

Description

Pipeline ID

Unique identifier for the pipeline instance (e.g., dp_17478933547263177)

Pipeline Name

User-defined name of the pipeline (e.g., testpipeline)

Pipeline Status

Current state: - Running (green): Pipeline is active and executing

Last Activated

Date and timestamp when the pipeline was most recently activated

Last Deactivated

Date and timestamp when the pipeline was last stopped

Total CPU Utilization (Core)

Total CPU usage at pipeline level, visualized as a progress bar with actual vs allocated usage (e.g., 1.090 / 1.100)

Total Memory Utilization (MB)

Memory usage at pipeline level in MB, similarly visualized (e.g., 972.820 / 2048)

Please note: CPU and memory utilization bars are color-coded for quick diagnostics:

  • Red for high CPU usage nearing the limit.

  • Green for stable memory usage.

Monitor tab (the default tab to be displayed)

Key Use Cases

The Pipeline Monitoring feature proves valuable in the following scenarios for enabling prompt and informed action:

  • Real-Time Health Monitoring: Instantly identify overutilization, idle stages, or inactive pipeline components.

  • Performance Optimization: Fine-tune resource allocation based on live metrics.

  • Operational Auditing: Maintain visibility of processing time, data throughput, and resource trends.

  • Root Cause Analysis (RCA): Identify failing or lagging pipeline components through system indicators.

Best Practices

Platform users can follow these best practices to maximize the effectiveness of the Monitoring functionality.

  • Regularly monitor CPU and memory utilization to avoid system overload.

  • Investigate status changes (e.g., UP to OFF) immediately to ensure pipeline reliability.

  • Ensure processing components show regular update timestamps in the Last Processed Time field.

Please note:

  • All metrics shown are updated in near real-time based on streaming telemetry from the underlying orchestrator (e.g., Kubernetes, Spark).

  • Metrics are reset on pipeline restart or reset.

Data Metrics

The Data Metrics section provides comprehensive visual insights into the component-wise data flow, performance, and throughput of individual pipeline components. It is designed to help users track data consumption, production, failure rates, and system resource usage over time. This allows for early detection of anomalies, lag, or resource saturation issues during execution.

Each pipeline "component" or "node" is displayed with a performance chart showing its data ingestion and processing behavior.

Displayed Information

Element

Description

Component Name

The identifier of the pipeline component (e.g., Sandbox Reader _1, SQL Component_1)

Consumed (Green)

Number of records/data units successfully read or ingested

Produced (Blue)

Number of records/data units emitted or written

Failed (Red)

Number of records that failed processing

Lag

(If applicable) Represents delay in record processing (typically used in streaming contexts)

Bars (Histogram)

Timeline view of the metrics in the selected interval (default: 30 minutes)

Data Metrics tab

Please note:

  • Use the Show all components toggle to visualize every pipeline stage.

  • Use the Refresh button to fetch the most recent data.

  • Interval (e.g., 30 Min) allows changing the metric granularity.

System Logs

The System Logs tab is an essential diagnostic component of the pipeline monitoring suite. It provides real-time visibility into the internal operations, events, and statuses of all components within the selected data pipeline. These logs enable data engineers, site reliability engineers (SREs), and DevOps teams to troubleshoot runtime issues, optimize performance, and ensure system stability. The System Logs tab allows deep inspection of pipeline behavior by combining log analysis with real-time metrics and modular filtering options. Pipeline users gain the transparency required to maintain reliable, high-throughput data pipelines.

Log View Panel (Main Section)

This central section displays a chronological list of logs generated by the pipeline components. Each log entry typically contains:

  • Timestamp (ISO 8601) – Denotes the exact UTC the log was generated.

  • Thread/Process Name – For example, [kubernetes-executor-snapshots-subscribers-0].

  • Log Level – Such as DEBUG, INFO, WARN, or ERROR.

  • Log Message – A detailed description of the runtime activity or system status. Example:

[kubernetes-executor-snapshots-suscribers-0]DEBUG org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator - ResourceProfile Id: 0

Controls and Filters (Top Section)

  • Selected Pod Dropdown: Allows you to filter logs for a specific Kubernetes pod or container instance associated with a pipeline component (e.g., sandbox-reader--1-tbcc...). This is helpful in distributed environments where multiple pods handle different stages of the pipeline.

  • Start Date Picker: Enables time-based filtering of logs for focused troubleshooting (e.g., investigating issues after a recent deployment or failure).

  • Refresh Button: Fetches the latest logs without reloading the full UI, ideal for real-time monitoring during pipeline execution.

  • Download Icon: Exports logs as a file for external analysis or archiving.

Pagination Controls (Bottom Section)

  • Allows navigation through large sets of log entries.

  • Helpful for in-depth root cause analysis and tracking log trends across time.

Use Cases for System Logs

  • Debugging Runtime Errors: Quickly locate and analyze exceptions or failures using ERROR logs.

  • Monitoring Resource Allocation: Inspect messages from Spark or K8s about pod allocation or executor behavior.

  • Auditing and Compliance: Export logs for traceability and reporting.

  • Performance Optimization: Identify lags, timeouts, or processing bottlenecks at the component level.

Best Practices

  • Filter by Pod when troubleshooting a specific stage or component in the selected pipeline.

  • Use Start Date to narrow down to the relevant execution window.

  • Monitor Log Levels:

    • DEBUG for development and test environments.

    • INFO/WARN/ERROR in production to limit noise.

  • Automate Log Exports for integration with centralized logging systems (e.g., ELK Stack, Datadog, or CloudWatch).

  • Correlate with CPU/Memory Metrics to identify resource-driven failures or spikes.