Export to Pipeline
A Notebook can be exported to the Data Pipeline module by using this option.
Last updated
A Notebook can be exported to the Data Pipeline module by using this option.
Last updated
Check out the walk-through on how to export a Notebook script to the Data Pipeline module.
Navigate to the Notebook list.
Click the Export to Pipeline icon for a Notebook.
The Export to Pipeline dialog box opens.
Select a specific function using the checkbox.
Click the Next option.
Please Note: The user must write a function to use Export to Pipeline functionality.
Click the Export option from the next page that opens for the Pipeline Export.
A confirmation message appears informing the completion of the action.
Navigate to a Pipeline homepage.
Click the Create Job option.
The New Job dialog window opens.
Provide the required information to create a new job.
Enter name: Provide name for the new job.
Job Description: Enter the description for the new job.
Job Baseinfo: Select the PySpark Job option using the drop-down.
Trigger By: The PySpark Job can be triggered by another Job or PySpark Job. The PySpark Job can be triggered in two scenarios from another jobs:
On Success: Select a job from drop-down. Once the selected job is run successfully, it will trigger the PySpark Job.
On Failure: Select a job from drop-down. Once the selected job gets failed, it will trigger the PySpark Job.
Is Scheduled: Put a check mark in the given box to schedule the new Job.
Spark config: Select resource for the new Job.
Click the Save option.
A notification message appears and the new Job gets created.
The recently created Job appears dragged to the Job Editor workspace by default.
Click on the Job component to open the configuration tabs.
Open the Meta Information tab of the PySpark Job component.
Project Name: Select the same Project using the drop-down menu where the concerned Notebook has been created.
Script Name: Select the script which has been exported from notebook in DS Lab module. The script written in DS Lab module should be inside a function.
External Library: If any external libraries used in the script we can mention here. We can mention multiple libraries by giving comma(,) in between the names.
Start Function: Select the function name in which the script has been written.
Script: The Exported script appears under this space.
Input Data: If any parameter has been given in the function, then the name of the parameter is provided as Key and value of the parameters has to be provided as value in this field.
Click the Save component in the storage to use the PySpark component in a workflow inside the Data Pipeline module.
Please Note: Refer the Data Science Lab Quick Start Flow page to get an overview of the Data Science Lab module in nutshell. Click here to get redirected to the quick start flow page.