There is a resource configuration tab while configuring the components.
The Data Pipeline contains an option to configure the resources i.e., Memory & CPU for each component that gets deployed.
There are two types of components-based deployment types:
Docker
Spark
After we save the component and pipeline, The component gets saved with the default configuration of the pipeline i.e. Low, Medium, and High. After the users save the pipeline, we can see the configuration tab in the component. There are multiple things:
There are Request and Limit configurations needed for the Docker components.
The users can see the CPU and Memory options to be configured.
CPU: This is the CPU config where we can specify the number of cores that we need to assign to the component.
Please Note: 1000 means 1 core in the configuration of docker components.
When we put 100 that means 0.1 core has been assigned to the component.
Memory: This option is to specify how much memory you want to dedicate to that specific component.
Please Note: 1024 means 1GB in the configuration of the docker components.
Instances: The number of instances is used for parallel processing. If the users. give N no of instances those many pods will be deployed.
Spark Component has the option to give the partition factor in the Basic Information tab. This is critical for parallel spark jobs.
Please follow the given example to achieve it:
E.g., If the users need to run 10 parallel spark processes to write the data where the number of inputs Kafka topic partition is 5 then, they will have to set the partition count to 2[i.e., 5*2=10 jobs]. Also, to make it work the number of cores * number of instances should be equal to 10.2 cores * 5
instances =10 jobs.
The configuration of the Spark Components is slightly different from the Docker components. When the spark components are deployed, there are two pods that come up:
Driver
Executor
Provide the Driver and Executor configurations separately.
Instances: The number of instances used for parallel processing. If we give N as the number of instances in the Executor configuration N executor pods will get deployed.
Please Note: Till the current release, the minimum requirement to deploy a driver is 0.1 Cores and 1 core for the executor. It can change with the upcoming versions of Spark.