Resource Observability

The Resource Observability page provides visibility into the Machines and Resources used to monitor and optimize platform performance. It helps administrators track infrastructure usage, detect issues, and ensure system reliability in cloud-based and distributed environments.

Machine & Resources Overview

Resource observability is critical for maintaining operational efficiency in environments with dynamic and distributed components such as:

  • Cloud services

  • Containerized workloads

  • Microservices architectures

The Resource Observability page offers a detailed breakdown of machine and resource metrics to monitor health, utilization, and performance trends.

By default, the page opens on the Machine tab, providing a snapshot of key system metrics.

Machine Tab

The Machine tab displays real-time infrastructure metrics and visualizations.

Metrics

  • Active Machines: Number of currently running machines.

  • CPU Capacity: Total processing power (in cores).

  • Memory Capacity: Total system memory (in bytes).

  • Number of Pods: Total count of running pods across machines.

Visualizations

  • CPU Allocation: Graphical distribution of CPU usage across processes.

  • Memory Allocation: Visualization of how memory is consumed, helping detect bottlenecks.

Machine and Kubernetes Instance Details

Each listed machine and Kubernetes instance includes the following details:

  • IP Address & Instances: Kubernetes-generated unique identifiers for each entity.

  • Readiness: Indicates whether the instance is ready.

  • CPU Requests and Limits: Minimum and maximum CPU resources (in cores).

  • Memory Requests and Limits: Minimum and maximum memory allocations (in bytes).

  • Percentage of Pods: Utilization percentage of pods relative to allocation.

  • Creation Time: Timestamp showing when the instance was created.

Tip: Click a specific entry to view graphical details on CPU Usage, Memory Usage, and Workloads.

Resource Tab

The Resource tab provides resource utilization details with two views:

  • Live View: Displays real-time usage metrics for active components.

  • All View: Displays historical and current data for deeper analysis of system behavior.

Metrics

Each entry in the Resource tab includes:

  • Creation Date & Time: When the resource was initiated.

  • Age: Duration the resource has been operational.

  • Restarts: Number of pod or instance restarts (useful for diagnosing stability issues).

  • CPU and Memory Usage: Aggregated usage metrics, shown with color-coded bars (green → yellow → red as usage increases).

  • Search Functionality: Quickly locate entities by name.

  • Workload Types Filter: Focus on specific resource categories. Available options:

    • Pipeline Workloads

    • Job Workloads

    • DS Lab

    • System Components

Key Benefits

  • Monitor infrastructure and workload performance in real-time.

  • Detect imbalances in CPU or memory allocation.

  • Diagnose pod restarts and stability issues.

  • Optimize resource allocation for efficient system performance.

Last updated