Python Script
Last updated
Last updated
The Python script component is designed to allow users to write their own custom Python scripts and run them in the pipeline. It also enables users to directly use scripts written in a DSLab notebook and run them in the pipeline.
Check out the given demonstrations to understand the configuration steps involved in the Python Script.
All component configurations are classified broadly into 3 section
Meta Information
Please Note: Do not provide 'test' as a component name or the component name should not start with 'test' in the component name field in the Meta information of the Python Script component. The word 'test' is used at the backend for some development processes.
Component Name: Provide a name to the component. Please note that the component name should be without space and special characters. Use the underscore symbol to show space in between words.
Start Function Name: It displays all the function names used in the python script in a drop-down menu. Select one function name with which you want to start.
In Event Data Type: The user will find two options here:
DataFrame
List of Dictionary
External Libraries: The user can provide some external python library in order to use them in the script. The user can enter multiple library names separated by commas.
Execution Type: Select the Type of Execution from the drop-down. There are two execution types supported:
Custom Script: The user can write their own custom python script in the Script field.
Script: The user can write their own custom python script in this field. Make sure the start should contain at least one function. The user can also validate the script by Clicking on Validate Script option in this field.
Start Function: Here, all the function names used in the script will be listed. Select the start function name to execute the python script.
Input Data: If any parameter has been given in the function, then the name of the parameter is provided as Key, and value of the parameters has to be provided as value in this field.
DSLab Script: In this execution type, the user can use the script which is exported from DSLab notebook. The user needs to provide the following information if selects this option as an Execution Type:
Project Name: Select the same Project using the drop-down menu where the Notebook has been created.
Script Name: This field will list the exported Notebook names which are exported from the Data Science Lab module to Data Pipeline.
Start Function: Here, all the function names used in the script will be listed. Select the start function name to execute the python script.
Script: The Exported script appears under this space. The user can also validate the script by Clicking on Validate Script option in this field. For more information to export the script from DSLab module, please refer the following link: Exporting a Script from DSLab.
Input Data: If any parameter has been given in the function, then the name of the parameter is provided as Key, and value of the parameters has to be provided as value in this field.
Pull script from VCS: It allows the user to pull desired committed script from the VCS.
Push script to VCS: It allow the user to commit different versions of a script to the VCS.
This feature enables the user to send data directly to the Kafka Event or data sync event connected to the component. Below is the command to configure Custom Kafka Producer in the script:
The Python Component has a custom logger feature that allows users to write their own custom logs, which will be displayed in the logs panel. Please refer to the code below for the custom logger:
Sample Python code to produce data using custom producer and custom logger:
Here,
df: Previous event data in the form of List or DataFrame connected to the Python component.
key1, key2, key3: Any parameter passed to the function from the Input Data section of the metadata info of the Python script component.
log_obj.info(): It is for custom logging and takes a string message as input.
kaf_obj.kafka_produce(): It is for the custom Kafka producer and takes the following parameters:
df: Data to produce – pandas.DataFrame and List of Dict types are supported.
Event name: Any Kafka event name in string format. If @EVENT.OUTEVENT
is given, it sends data to the connected out event. If @EVENT.FAILEVENT
is given, it sends the data to the connected failover event with the Python script component.
Any Failed Message: A message in string format can be given to append to the output data. The same message will be appended to all rows of data (this field is optional).
The Custom Python Script transform component supports 3 types of scripts in the Data Pipeline.
1. As Reader Component: If you don’t have any in Event then you can use no argument function. For Example,
2. As Transformation Component: If you have data to execute some operation, then use the first argument as data or a list of dictionaries. For Example,
Here the df holds the data coming from the previous event as argument to the pram of the method.
3. Custom Argument with Data: If there is a custom argument with the data-frame i.e. the data is coming from the previous event and we have passed the custom argument to the parameter of the function. here df will hold the data from the previous event and the second param: arg range can be given in the input data section of the component.