# Creating a New Job

Jobs are used for ingesting and transferring data from separate sources. The user can transform, unify, and cleanse data to make it suitable for analytics and business reporting without using the Kafka topic which makes the entire flow much faster.

{% hint style="success" %}
*Check out the given demonstration to understand how to create and activate a job.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FARWwq0mHnUobVAJqtFtw%2Fuploads%2FUk5UT9vbEEAUzHYBjUtr%2FCreating%20a%20New%20Job.mp4?alt=media&token=f426c090-4ef6-4929-a6f8-296f134af191>" %}
***Creating a New Job***
{% endembed %}

* Navigate to the ***Data Pipeline*** homepage.
* Click on the ***Create Job*** icon.

![](https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/WJvjE9phKGue2xb5DTBy/3.png)

* The ***New Job*** dialog box appears redirecting the user to create a new Job.
* Enter a name for the new Job.
* Describe the Job(Optional).
* **Job Baseinfo:** In this field, there are three options:
  * **Spark Job**
  * **PySpark Job**
  * **Python Job**
* **Trigger By**: There are 2 options for triggering a job on success or failure of a job:
  * **Success Job**: On successful execution of the selected job the current job will be triggered.
  * **Failure Job:** On failure of the selected job the current job will be triggered.
* **Is Scheduled?**
  * A job can be scheduled for a particular timestamp. Every time at the same timestamp the job will be triggered.
  * Job must be scheduled according to UTC.
* **Concurrency Policy:** Concurrency policy schedulers are responsible for managing the execution and scheduling of concurrent tasks or threads in a system. They determine how resources are allocated and utilized among the competing tasks. Different scheduling policies exist to control the order, priority, and allocation of resources for concurrent tasks.&#x20;

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>* ***Concurrency Policy** will appear only when "**Is Scheduled"** is enabled.*
{% endhint %}

* There are 3 **Concurrency Policy** available:
  1. **Allow:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the next task will run in parallel with the previous tasks.<br>

     <figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/ZfE4YekFY7yravIr4aUr/image.png" alt=""><figcaption><p>Allow concurrency policy</p></figcaption></figure>

  2. **Forbid:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the next task will wait until all the previous tasks are completed.<br>

     <figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/KHoKbqGhHdi2ciEWoKQD/image.png" alt=""><figcaption><p>Forbid concurrency policy</p></figcaption></figure>

  3. **Replace:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the previous task will be terminated and the new task will start processing.<br>

     <figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/A7CY8EjBrMJf3LcX4zuJ/MicrosoftTeams-image%20(157).png" alt=""><figcaption><p>Replace concurrency policy</p></figcaption></figure>

* **Spark Configuration**
  * Select a resource allocation option using the radio button. The given choices are:
    * Low
    * Medium
    * High
  * This feature is used to deploy the Job with high, medium, or low-end configurations according to the velocity and volume of data that the ***Job*** must handle.
  * Also, provide the resources to Driver and Executer according to the requirement.

<figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/NCLFfFo2LxlZt8Til9yb/image.png" alt=""><figcaption></figcaption></figure>

<figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/yBAFbuRwhwFmPj0jUiwC/image.png" alt=""><figcaption><p>Providing resources to the driver and executer for theJob.</p></figcaption></figure>

* Click the ***Save*** option to create the job.&#x20;

<figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/tA9A7OTjuZ1Do5UfkCgK/image.png" alt=""><figcaption></figcaption></figure>

* A success message appears to confirm the creation of a new job.
* The ***Job Editor*** page opens for the newly created job.

<figure><img src="https://content.gitbook.com/content/q2i9CKCFbySxr6jRoJfA/blobs/MDmN9zEPb02kgNigzaS7/image.png" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>*&#x20;

* *The **Trigger by** feature will not work if the selected Trigger by job is running in the **Development** mode. Trigger by feature will only work when the selected Trigger by Job is activated.*
* *By clicking the **Save** option, the user gets redirected to the job workflow editor.*
  {% endhint %}
