Workflow 1

Read data from S3, apply transformations, run a trained model via AutoML, and write the output into a target database.

This workflow showcases the complete machine learning lifecycle — from raw data to production-ready insights — using a no-code/low-code approach. By the end of Workflow, users will have created a fully functional, automated pipeline that reads data from S3, applies transformations, runs the trained model, and writes the output into a target database.

Create a Sandbox and Upload the CSV File

· From the BDB Platform homepage, click on the Apps icon and navigate to the Data Center.

· Inside the Data Center, click on the “sandbox” button and then click “create”

· Upload your CSV file by dragging and dropping or browsing your system.

· After the file loads successfully, click Upload, and the sandbox will be created and available for use.

· In the sandbox list, click on the three dots next to your created Sandbox.

· Choose “Create Data Preparation” to begin cleaning and processing your data.

· This simplifies the cleaning process and sets a solid foundation for machine learning.

o Delete the Gender column: Select the column → click Transforms → search for Delete Column transform and click on it, the Gender column is now successfully deleted.

o Remove empty rows from Previous CTC and Offered CTC: Repeat click on the transforms option → search for delete empty rows and click on it for respective column.

· Once transformation steps are complete:

o Rename the data preparation for easy reference.

o All transformations are tracked in steps Tab and can be removed via the steps tab.

o Click on Save button to finalize.

Create and Run AutoML Experiment using Data Science lab

· Navigate to the AutoML section in the DS Lab Module and click the Create Experiment button.

· Configure the experiment with the following details:

o Experiment Name: Hiring Data

o Experiment Type: Classification

· Under the Configure Dataset option:

o Select Sandbox as the dataset source.

o Select the File Type as CSV from the dropdown.

o Choose the required sandbox from the dropdown menu.

· In the Advanced Information section:

o Select the Data Preparation process you created earlier.

o Set the Target Column to Gender.

· Click Save to initialize and start running the experiment.

· The AutoML will automatically test multiple models in the background, evaluate their performance, and select the most optimal approach.

· Once training is complete, click View Report to explore:

o Model accuracy and performance metrics

o Comparative results of the tested models

o Recommendations for the best-fit model

Register the Best Model

· From the Model section, select the model you created using the AutoML workflow.

· Click the Register button (arrow icon) to register the model.

· Once registered, the model becomes available for use in both real-time and batch pipelines for prediction.

Create a New Pipeline

· From the Apps menu, select the Data Pipeline Plugin.

· Create a new pipeline by clicking on the Pipeline option, then click Create.

· Provide an appropriate name for the pipeline and allocate the required resources.

· Click Save to store the pipeline configuration.

Add S3 Reader Components

· Click the ‘+’ icon from the toolbar on the right side of the screen to open the Components Palette if it’s not present in the canvas

· In the search bar, type S3 Reader.

· Drag and drop the S3 Reader component onto the pipeline canvas.

· Set the Invocation Type to Real-Time, then move to the Meta Information tab.

· Enter the required details in Meta Information, including:

· Region

· File Type (CSV)

· Bucket Name

· Access Key

· Path Info

· Save the component once it has been configured.

· From the Event Panel (located next to the Components Palette), click the ‘+’ icon to create a Kafka Event.

· Drag and drop the event onto the pipeline canvas.

· The event will automatically connect to the source component.

Add a Data Preparation component

· From the Components Palette, go to the Transformation section and drag and drop the Data Preparation component onto the pipeline canvas.

· Set the Invocation Type to Batch, then move to the Meta Information tab.

· In Meta Information:

· Select the Data Center Type as Data Sandbox.

· Choose the appropriate Sandbox Name.

· Select the required Preparation.

· Save the component once configured.

· From the Event panel (next to the component palette), click the ‘+’ icon to add a Kafka Event.

· Drag and drop the newly created event onto the pipeline canvas.

· Finally, add the AutoML component from the Components Palette.

Add Auto ML component.

· Navigate to the search bar of components Palette and search for auto ml component.

· Drag and drop the Auto ML Component:

o Set Invocation as Batch in basic information panel

o Move to meta information, select the model from the model name drop down.

o Save the component, once done.

· From the “event” panel situated next to the component in the component palette, Click on the ‘+’ icon to add a kafka event, After adding the event drag and drop the event to the pipeline canvas.

Add DB Writer Component to Store Final Results

Drag and drop the DB Writer component from the writer section of the component palette
Give the invocation type as batch and move to meta information tab
Enter required DB credentials in the meta information tab
Host, Port, Database Name, Table Name, Username, Password.
Use ClickHouse as the driver and Append as save mode.

· Validate the connection and save the component.

Activate the Pipeline.

Click “Activate” icon located in the toolbar.
Once activated, the pipeline will begin executing and the pods will start deploying.

· After all pods are up and running, move to the Logs section to track pipeline execution in detail.

· Check the Component Logs to monitor the status of each component.

Monitor logs as the data flows from S3 → Data Prep → AutoML → DB Writer using the logs.
Users can view the sample records within the respective component events to verify data flow and transformations at each stage.
Click on the event, go to the Preview tab, and view the sample data processed by the component.
Check the Logs Panel on the right to troubleshoot issues or verify progress.

· Confirm that data ingestion and transformation have been completed successfully.

· The AutoML component will generate the results, and the DB Writer will load them into the target database.

· Watch for confirmation messages such as “DB Writer written data successfully” to ensure proper execution.

· Once the pipeline run is complete, deactivate the pipeline to avoid unnecessary compute usage.

· The processed data is now available for dashboards, analytics, and further reporting.

PreviousData Pipeline NextWorkflow 2

Last updated 27 days ago