> For the complete documentation index, see [llms.txt](https://docs.bdb.ai/data-pipeline/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.bdb.ai/data-pipeline/getting-started/homepage/create-job.md).

# Create Job

Jobs are used for ingesting and transferring data from separate sources. The user can transform, unify, and cleanse data to make it suitable for analytics and business reporting without using the Kafka topic which makes the entire flow much faster.

{% hint style="success" %}
*Check out the given demonstration to understand how to create and activate a job.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fuq3RSHHup7fjHYaspk7y%2Fuploads%2FpjIj0X1t4hzbvzGiKMr6%2Fezgif-1-8be81f1bac.mp4?alt=media&token=b39ed06a-92c4-45ab-a1ea-bf16475e49c0>" %}
Creating and activating a job.
{% endembed %}

## **Creating a new Job**

* Navigate to the ***Data Pipeline*** homepage.
* Click on the ***Create Job*** icon.

![](/files/V33Mi1KFlGn5ZjcgHM2v)

* The ***New Job*** dialog box appears redirecting the user to create a new Job.
* Enter a name for the new Job.
* Describe the Job(Optional).
* **Job Baseinfo:** In this field, there are two options:
  * **Spark Job**
  * **PySpark Job**
* **Trigger By**: There are 2 options for triggering a job on success or failure of a job:
  * **Success Job**: On successful execution of the selected job the current job will be triggered.
  * **Failure Job:** On failure of the selected job the current job will be triggered.
* **Is Scheduled?**
  * A job can be scheduled for a particular timestamp. Every time at the same timestamp the job will be triggered.
  * Job must be scheduled according to UTC.
* **Concurrency Policy:** Concurrency policy schedulers are responsible for managing the execution and scheduling of concurrent tasks or threads in a system. They determine how resources are allocated and utilized among the competing tasks. Different scheduling policies exist to control the order, priority, and allocation of resources for concurrent tasks.&#x20;

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>* ***Concurrency Policy** will appear only when "**Is Scheduled"** is enabled.*
{% endhint %}

* There are 3 **Concurrency Policy** available:
  1. **Allow:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the next task will run in parallel with the previous tasks.<br>

     <figure><img src="/files/HZ6ZvTnYSdlVf6Yfc82Z" alt=""><figcaption><p>Allow concurrency policy</p></figcaption></figure>

  2. **Forbid:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the next task will wait until all the previous tasks are completed.<br>

     <figure><img src="/files/6lAtSeouYAT6AtZL0z0R" alt=""><figcaption><p>Forbid concurrency policy</p></figcaption></figure>

  3. **Replace:** If a job is scheduled for a specific time and the first process is not completed before the next scheduled time, the previous task will be terminated and the new task will start processing.<br>

     <figure><img src="/files/qV2u0AIeUYFybwL6Z8A7" alt=""><figcaption><p>Replace concurrency policy</p></figcaption></figure>

* **Spark Configuration**
  * Select a resource allocation option using the radio button. The given choices are:
    * Low
    * Medium
    * High
  * This feature is used to deploy the Job with high, medium, or low-end configurations according to the velocity and volume of data that the ***Job*** must handle.
  * Also, provide the resources to Driver and Executer according to the requirement.

<figure><img src="/files/DajaaQu9uBhPXCNchR7A" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/6EacaADYslpcKtT80Rrb" alt=""><figcaption><p>Providing resources to the driver and executer for theJob.</p></figcaption></figure>

* Click the ***Save*** option to create the job.&#x20;

<figure><img src="/files/GSxYe6ZwhSFNCMvP32sy" alt=""><figcaption></figcaption></figure>

* A success message appears to confirm the creation of a new job.
* The ***Job Editor*** page opens for the newly created job.

<figure><img src="/files/GIEQ6iQgVzSfqKdaqLjp" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>*&#x20;

* *The **Trigger by** feature will not work if the selected Trigger by job is running in the **Development** mode. Trigger by feature will only work when the selected Trigger by Job is activated.*
* *By clicking the **Save** option, the user gets redirected to the job workflow editor.*
  {% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.bdb.ai/data-pipeline/getting-started/homepage/create-job.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
