Register & Publish Notebook

Transform your notebook from a static document into a reusable, versioned, and shareable asset across modules.

You can register a data science script to make it a reusable component within your projects. Once registered, the script can be accessed by other users and integrated into different workflows and pipelines without being manually imported each time.

This process involves:

  • Standardization: Registering a script ensures it adheres to a defined structure and interface, making it easier to be used across various projects.

  • Version Control: Registered scripts are versioned, allowing you to track changes and roll back to previous versions if needed.

  • Promotes Reuse: A published notebook can be easily discovered and integrated into new projects without the need to manually copy the code. This prevents redundancy and promotes code sharing across teams.

Export as a Script

The Export as a Script functionality in Data Science Lab (DSLab) allows users to export a notebook script to the Data Pipeline module.

Navigation path: Data Science Lab > Workspace > Repo Folder > Notebook > Elipsis > Register > Export as a Script

Steps to export a Data Science Script:

  • Navigate to the Repo folder in the Workspace tab.

  • Select the Notebook that you want to export.

  • Click the Ellipsis (three-dot) icon next to the selected notebook.

  • From the Context Menu, click Register.

  • The Register window opens.

  • Select the script with a function. You may use the Select All option if needed.

  • Click Next to proceed.

  • Select the Export as a Script option by checking the corresponding checkbox.

  • The preview of the selected script appears.

  • Click Finish.

circle-info

Note: The user must write a function inside the notebook to use the Export to Pipeline functionality.

Opening External Libraries

  • Click the External Libraries icon.

  • The Libraries drawer opens, displaying the available external libraries.

  • Select the required libraries using checkboxes.

  • Click the Close icon to close the Libraries drawer.

  • You are redirected to the Register page.

  • Click Finish to complete the export.

Accessing an Exported Script in the Data Pipeline Module

Once the script is exported to the Data Pipeline module, it can be consumed within a DS Lab Runner component.

Steps to Access the Exported Script:

  • Navigate to the Data Engineering module.

  • Open the Pipelines section, displaying the list of existing pipelines.

  • Select a pipeline that contains the DS Lab Runner component from the list.

  • Open the Meta Information tab of the DS Lab Runner component.

  • Select the following information:

    • Execution Type: Choose Script Runner from the drop-down menu.

    • Function Input Type: Select one of the following options:

      • Data Frame

      • List

    • Project Name: Select the project name from the drop-down menu.

    • Script Name: Select the script name from the drop-down menu.

    • External Library: Mention any external libraries that the script requires.

    • Start Function: Choose the start function name from the drop-down menu.

circle-info

Notes:

  • The Export as a Script functionality requires that a function be written within the notebook. This function will be executed when the script is run in the Data Pipeline module.

  • Once the script is exported, it becomes available to the DS Lab Runner component for use in pipelines.

  • The exported script is accessible by selecting the appropriate project name and script name in the Meta Information tab of the DS Lab Runner component.

Registering and Re-Registering a Data Science Script as a Job

In the Data Science Lab (DSLab), users can register or re-register Data Science Scripts as Jobs in the Data Engineering module. This functionality allows users to schedule and execute the scripts within the pipeline, configure job-specific settings such as the execution environment, payloads, and concurrency policies.

Key Features:

  • On-demand Jobs: Python jobs that run without a predefined schedule.

  • Concurrency Policy: Manages how tasks are handled when overlapping execution times occur.

  • Alerts: Configure notifications for job success or failure.

Steps to Register a Data Science Script as a Job

Navigation path: Data Science Lab > Workspace > Repo Folder > Notebook > Elipsis > Register > Register as a Job

  1. Navigate to the Project Workspace where your notebook resides.

  2. Open the Repo folder and select the notebook (.ipynb file) from where you want to register the script as a job.

  3. Click the Ellipsis (three-dot) icon for the selected notebook.

  4. From the Context Menu, select Register.

  5. The Register window opens.

  6. Select the script with a function. You may use the Select All option if needed.

  7. Click Next to proceed.

  8. Select the Register as a Job option for the selected script.

  9. A preview of the selected script will be displayed below.

  10. Click Next to proceed.

Job Configuration Settings:

  • Enter the Scheduler Name for the job.

  • Enter the Scheduler Description for the job.

  • Select the Start Function from the dropdown.

  • Select the Job baseinfo.

  • On-demand:

    • If selected, the job will not be scheduled but executed on demand.

    • The Payload option will appear, where the user must enter the payload in the form of a list of dictionaries.

    • Example:

  • Concurrency Policy:

    • Select a concurrency option (only available for jobs with a scheduler configured).

    • Options:

      • Allow: Run the next task in parallel if the first task has not completed before the next scheduled time.

      • Forbid: Wait for the first task to complete before starting the next task.

      • Replace: Terminate the previous task and start the new task when the next scheduled time arrives.

circle-info

Note: The Concurrency Policy option is only available for jobs where the scheduler is configured. For On-demand jobs, this option is not displayed.

  • Scheduler Time: Provide the time using the Cron generator for scheduling the job (only visible for scheduled jobs).

  • Alert: Configure job alerts to send notifications to Teams or Slack channels upon success or failure.

  • Click Finish to complete the registration process.

Steps to Re-Register a Data Science Script as a Job

  • Navigate to the Project Workspace and select the previously registered .ipynb file.

  • Click the Ellipsis (three-dot) icon for the selected notebook.

  • From the Context Menu, select Register.

  • In the Register window, select the Re-Register option using the checkbox.

  • Choose the version you want to re-register by using the checkbox.

  • Click Next to continue.

  • The script will be pre-selected for re-registration. Select the Next option.

  • A notification will appear confirming that the script is valid.

  • Click Next again to proceed.

Job Configuration for Re-Registration:

  • Start Function: Select the function from the drop-down menu to use as the entry point for the job.

  • Job Base Info: Select the appropriate job type (e.g., Python Job, PySpark Job, etc.).

  • Docker Config: Choose the resource allocation (Low, Medium, High).

    • Request (CPU/Memory): Configure the required resources for the job.

circle-info

Note: If On-demand is checked, it will function as described in the registration section above. The Payload field will appear for entering the job parameters.

  • Click Finish to re-register the job.

Registering a Data Science Script as a New Job

  • Follow the same steps as in the Re-Register a Data Science Script as a Job section.

  • In the Register window, select Register as New using the checkbox.

  • Complete the configuration as described for the Re-Register section.

  • Click Finish to create a new job from the selected script.

circle-info

Please Note:

  • On-demand Jobs: These jobs will not be scheduled but can be triggered manually. The payload must be entered for execution.

  • Concurrency Policy: This is not available for On-demand jobs. It is only visible when a scheduler is configured for the job.

  • Alerts: Notifications can be sent to Teams or Slack based on job success or failure.

Accessing a Registered Job in the Jobs List

The Registered Jobs can be accessed from the Jobs list page within the Data Engineering module.

Navigation path: Data Engineering > Jobs > Jobs List

  • Navigate to the Jobs section within the Data Engineering module.

Publish as a Component

Navigation path: Data Science Lab > Workspace > Repo Folder > Notebook > Elipsis > Publish as a Component

  • Navigate to the Project Workspace where your notebook resides.

  • Open the Repo folder and select the .ipynb file you want to register as a job.

  • Click the Ellipsis (three-dot) icon for the selected notebook.

  • From the Context Menu, select Publish as a Component.

  • The Publish window opens, displaying the script.

  • Click Next to validate the script.

  • After getting a success notification for the validation of the script, select the Publish as a component option.

  • The preview of the selected script will appear below.

  • Click Next to proceed.

  • The Publish window opens.

  • Enter the Component configuration details to publish the script as a component:

    • Enter a Component Name

    • Enter the Component Description (optional)

    • Select a Start Function

    • Select a Function Input Type out of Data Frame or List of Dictionary.

  • Click Finish to complete the publish action.

Accessing the Published Script as a Component

The user can access the published script inside the Custom components section when published as a component.

  • Navigate to the Data Engineering module.

  • Open a pipeline or create a new one.

  • Open the Components menu.

  • The user can drag the custom component to the workspace and map it in a pipeline workflow.

circle-info

Note: Please refer to the Creating a Python Data Processing Job and On-Demand Python Job Execution using the BDB Platform sections to understand the steps on how to register a DSL Script as a job.