Resource Observability
The Resource Observability page provides visibility into the Machines and Resources used to monitor and optimize platform performance. It helps administrators track infrastructure usage, detect issues, and ensure system reliability in cloud-based and distributed environments.
Machine & Resources Overview
Resource observability is critical for maintaining operational efficiency in environments with dynamic and distributed components such as:
Cloud services
Containerized workloads
Microservices architectures
The Resource Observability page offers a detailed breakdown of machine and resource metrics to monitor health, utilization, and performance trends.
By default, the page opens on the Machine tab, providing a snapshot of key system metrics.
Machine Tab
The Machine tab displays real-time infrastructure metrics and visualizations.
Metrics
Active Machines: Number of currently running machines.
CPU Capacity: Total processing power (in cores).
Memory Capacity: Total system memory (in bytes).
Number of Pods: Total count of running pods across machines.
Visualizations
CPU Allocation: Graphical distribution of CPU usage across processes.
Memory Allocation: Visualization of how memory is consumed, helping detect bottlenecks.
Machine and Kubernetes Instance Details
Each listed machine and Kubernetes instance includes the following details:
IP Address & Instances: Kubernetes-generated unique identifiers for each entity.
Readiness: Indicates whether the instance is ready.
CPU Requests and Limits: Minimum and maximum CPU resources (in cores).
Memory Requests and Limits: Minimum and maximum memory allocations (in bytes).
Percentage of Pods: Utilization percentage of pods relative to allocation.
Creation Time: Timestamp showing when the instance was created.
Tip: Click a specific entry to view graphical details on CPU Usage, Memory Usage, and Workloads.
Resource Tab
The Resource tab provides resource utilization details with two views:
Live View: Displays real-time usage metrics for active components.
All View: Displays historical and current data for deeper analysis of system behavior.
Metrics
Each entry in the Resource tab includes:
Creation Date & Time: When the resource was initiated.
Age: Duration the resource has been operational.
Restarts: Number of pod or instance restarts (useful for diagnosing stability issues).
CPU and Memory Usage: Aggregated usage metrics, shown with color-coded bars (green → yellow → red as usage increases).
Navigation & Filtering
Search Functionality: Quickly locate entities by name.
Workload Types Filter: Focus on specific resource categories. Available options:
Pipeline Workloads
Job Workloads
DS Lab
System Components
Key Benefits
Monitor infrastructure and workload performance in real-time.
Detect imbalances in CPU or memory allocation.
Diagnose pod restarts and stability issues.
Optimize resource allocation for efficient system performance.
Last updated