# Resource Provisioning

## Resource Configuration per Component

Each component in a pipeline can be configured with **resource allocation settings** to ensure optimal performance based on workload requirements. Once a pipeline and its associated components are saved, each component inherits the **default pipeline configuration settings**—Low, Medium, and High.

After the pipeline is saved, the **Configuration** tab becomes available within the component interface, enabling fine-grained tuning of compute and memory resources.

Two primary **deployment types** are supported: **Docker** and **Spark**, each offering distinct configuration parameters.

## Deployment Types and Resource Configuration

### Docker Deployment

When deploying a component using Docker, resource configuration can be specified under the **Configuration** tab.

* **CPU (Cores)**
  * Defines the number of CPU cores allocated to the container.
  * *Note*: In Docker configuration, `1000 = 1 core`. For example, `100` = 0.1 core.
* **Memory (RAM)**
  * Specifies the memory allocated to the container.
  * *Note*: In Docker configuration, `1024 = 1 GB`.
* **Instances**
  * Defines the number of container instances deployed for parallel processing.
  * If *N instances* are specified, N pods are deployed.
* **Requests and Limits**
  * Docker components support configuration of both **resource requests** (minimum guaranteed resources) and **limits** (maximum allowed resources), ensuring predictable performance and avoiding resource contention.

### Spark Deployment

When deploying a component using **Apache Spark**, resource allocation applies at both the **driver** and **executor** levels.

* **Driver Configuration**
  * **Driver CPU and Memory**: Specifies the cores and memory allocated to the Spark driver, which manages the Spark context and coordinates job execution.
* **Executor Configuration**
  * **Executor CPU and Memory**: Specifies the cores and memory allocated to Spark executors, which execute tasks in parallel.
  * **Instances**: Defines the number of executors. If *N executors* are configured, N executor pods are deployed.
* **Minimum Requirements**
  * As of the current release, the **minimum driver requirement** is 0.1 cores.
  * The **minimum executor requirement** is 1 core.
  * These values may change in upcoming Spark versions.

Spark resource configuration enables **fine-grained tuning** for distributed workloads, maximizing **scalability and efficiency**.

### Comparison: Docker vs. Spark Resource Configuration

<table data-header-hidden><thead><tr><th width="149"></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Aspect</strong></td><td><strong>Docker Deployment</strong></td><td><strong>Spark Deployment</strong></td></tr><tr><td><strong>Resource Scope</strong></td><td>Configured at the <strong>container</strong> level for each component.</td><td>Configured at the <strong>driver</strong> and <strong>executor</strong> levels for distributed workloads.</td></tr><tr><td><strong>CPU Allocation</strong></td><td>- Defined in cores.<br>- <code>1000 = 1 core</code>, <code>100 = 0.1 core</code>.</td><td>- Driver CPU: Allocated to Spark driver for coordination.<br>- Executor CPU: Allocated per executor.</td></tr><tr><td><strong>Memory Allocation</strong></td><td>- Defined in MB.<br>- <code>1024 = 1 GB</code>.</td><td>- Driver Memory: For Spark driver context.<br>- Executor Memory: For each executor task.</td></tr><tr><td><strong>Instances</strong></td><td>- Defines number of pods deployed for parallelism.<br>- <em>N instances = N pods</em>.</td><td>- Defines number of executor pods.<br>- <em>N executors = N pods</em>.</td></tr><tr><td><strong>Minimum Requirements</strong></td><td>- CPU: 0.1 core.<br>- Memory: No strict limit, defined by user.</td><td>- Driver: Minimum 0.1 cores.<br>- Executor: Minimum 1 core.</td></tr><tr><td><strong>Requests &#x26; Limits</strong></td><td>- Supports both <strong>resource requests</strong> (min guaranteed) and <strong>limits</strong> (max allowed).</td><td>- Resource requests defined via <strong>driver/executor configs</strong> for fine-grained cluster control.</td></tr><tr><td><strong>Parallel Processing</strong></td><td>Achieved via multiple <strong>container instances (pods)</strong>.</td><td>Achieved via <strong>executors</strong>, each handling tasks in parallel.</td></tr><tr><td><strong>Optimization</strong></td><td>- Best for <strong>lightweight workloads</strong> and <strong>containerized tasks</strong>.</td><td>- Best for <strong>distributed data processing</strong>, <strong>ML trainin</strong></td></tr></tbody></table>

## Best Practice Recommendations

### When to Use **Docker Deployment**

* [x] **Lightweight Workloads** – Small-scale ETL, API services, and micro-tasks.
* [x] **Containerized Microservices** – Stateless workloads that do not require distributed execution.
* [x] **Rapid Iteration & Testing** – Prototyping and pipeline validation.
* [x] **Resource-Constrained Tasks** – Predictable workloads with minimal CPU/memory.

### When to Use **Spark Deployment**

* [x] **Distributed Data Processing** – Large-scale ETL, aggregation, and transformation.
* [x] **Machine Learning & AI** – Training and inference of models requiring parallel computing.
* [x] **Streaming Analytics** – High-volume streaming and near real-time pipelines.
* [x] **Scalable Workloads** – Jobs requiring elastic scaling across multiple executors.

### General Best Practices

* [x] **Right-Size Resources** – Start small, monitor performance, and scale incrementally.
* [x] **Use Node Pools Strategically** – Assign Spark jobs to high-performance/GPU-enabled pools, Docker tasks to cost-optimized pools.
* [x] **Enable Intelligent Scaling** – Only if maximum instances > minimum instances.
* [x] **Adopt GitOps with FluxCD** – Version-control all resource configurations to ensure auditability and consistency.

By applying these configurations and best practices, the BDB Platform ensures **flexibility, cost efficiency, and scalability** across diverse data and AI workloads.

## Optimization Configuration Fields

### Node Pool Selection

The **Node Pool** option provides control over the **execution environment** of individual components by specifying the compute node group where the component runs.

**Key Benefits**

1. **Workload Isolation** – Assign components to node pools based on criticality, performance, or security policies (e.g., secure node pools for sensitive workloads).
2. **Optimized Resource Allocation** – Place compute-intensive tasks (e.g., ML training) on GPU-enabled pools, and lightweight tasks on cost-efficient pools.
3. **Cost Management** – Assign non-critical workloads to low-cost node pools (e.g., spot instances) and reserve premium resources for high-priority jobs.
4. **Performance Optimization** – Minimize latency by routing heavy compute workloads to high-throughput node pools.
5. **Environment-Specific Execution** – Leverage custom node pools with preinstalled libraries and dependencies for specific workloads, reducing setup overhead.

### Intelligent Scaling

**Intelligent Scaling** dynamically adjusts resource allocation based on workload demands and system performance metrics.

* **Enable Intelligent Scaling** to optimize utilization and execution efficiency.

{% hint style="info" %}
**Recommendation**: Enable this option **only if maximum instances > minimum instances**, ensuring elasticity while avoiding under-provisioning.
{% endhint %}

<figure><img src="/files/t524VFa5FtPDBExmHA7Z" alt=""><figcaption></figcaption></figure>

The BDB Platform provides a robust framework for managing system resources with its comprehensive configuration options. This framework ensures flexible, scalable, and cost-effective resource allocation, effectively supporting diverse and demanding analytics and machine learning workloads.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/bdb-user-documentation/platform-modules/11.0/data-engineering/data-pipelines/pipeline-editor/resource-provisioning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
