This section displays steps on how to create a Project or Feature Store.
This page explains project creation steps for a Data Science Lab Project.
A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments and their associated artifacts.
Check out the given illustration on how to create a DSL Project.
Pre-requisite: The users must have the following Admin-level settings configured to access and use the Repo Syncs Project functionality inside the DS Lab module.
Configuring the DS Lab Settings option is mandatory before beginning with the Data Science Project creation.
Also, select the Algorithms by using the Algorithms field from the DS Lab Settings section you wish to use for your DS Lab project.
The user must have the following Version Control settings done.
The token key has to be configured for the DS Lab module.
The repository and branch have to be specified to save the settings.
The user must complete the following Custom Field Settings:
Token key – bdbvcstoken
User id key - bdbvcsuserid
The user must do the following User-level configuration to create a Repo Sync DS Lab project.
Navigate to the Home page of the Data Science Lab module.
Click the Create icon from the homepage.
The Create Project or Feature Store drawer opens.
Click the Create option provided for the Project.
The Create Project opens to provide the related information for a new Project.
Provide the following details for a new project:
Project Name: Give a name to the new project.
Project Description: Describe the project.
Select Algorithms: Select algorithms using the drop-down menu.
Environment: Allows users to select the environment they want to work in. Currently, supported Python frameworks are Sklearn (default), TensorFlow, and PyTorch (The user can execute Sklearn commands by default in the notebook).
Users who select the TensorFlow environment do not need to install packages like the TensorFlow and Keras explicitly in the notebook. These packages can be imported inside the notebook.
Users who select the PyTorch environment do not need to install packages like Torch and Torchvision in the notebook. These packages can be imported inside the notebook.
The users can select an option from the given choices: 1. Python Tensor Flow, 2. Python PyTorch
Resource Allocation: This allows the users to allocate CPU/ GPU and memory to be used by the Notebook container inside a given project. The currently supported Resource Allocation options are Low, Medium, and High.
Idle Shutdown: It allows the users to specify the idle time limit after which the notebook session will get disconnected, and the project will be deactivated. To use the notebook again, the project should be activated. The supported Idle Shutdown options are 30m, 1h, and 2h.
External Libraries: Mention the names of external libraries (if a specific version is required then mention the library name with the version number) that must be installed in your DSL project /notebook. The names of the external libraries should be separated only by commas (without space) for this field. This is an optional field.
After you fill in the mandatory fields the following modifiable fields appear with pre-selected values:
Image Name
Image Version
Limit
Memory
Request (CPU)
Memory
Git Project: Select a project from the drop-down menu.
Git Branch: Select a branch option from the drop-down menu (The supported branches are main, migration, and version).
GPU Type: Select GPU type from the drop-down menu (Currently we support Nvidia as the GPU Type).
GPU Limit: Set the GPU limit using this field (This field appears only after the GPU Type option is selected).
Sync git repo at project creation: Put a checkmark in the given checkbox to avail of sync git repo while creating a DS Lab project.
Please Note:
Click the Save option.
The confirmation message appears.
The newly created project gets saved, and it appears on the screen.
A DSL Project displays various status of the container on the top right side of the header panel.
The user gets all the updates regarding container status through color coded message display for a specific DSL Project. After creating a new project and opening it the user gets to see various status messages on the top right side of the page.
Steps to see the container message:
Open an active Data Science Project.
The user gets redirected to create or import Notebook.
The container status message gets displayed on the top right side of this screen.
The following status messages get displayed till the container gets created and comes into the running status.
Please Note: A container status message appears when container is not available. An error message also appears to inform user that the Project container is not up and running.
Container status message when container is getting created, and it is initializing.
Container status message when container is running.
Please Note: The user can click on the branch icon to get the latest branch related configuration.
A Feature Store is a centralized repository for storing, managing, and sharing machine learning (ML) features or attributes used to train models. It is a scalable solution for organizing and cataloging features, making them easily accessible to data scientists and ML engineers across an organization. Feature Stores facilitate collaboration, version control, and reusability of features, streamlining the ML development process and improving model quality and efficiency.
Check out the illustration to create a new Feature Store.
Navigate to the Homepage of the Data Science Lab module.
Click the Create icon from the homepage.
The Create Project or Feature Store drawer opens.
Click the Create option provided for the Feature Store.
The Create Feature Store page opens.
Provide a name for the Feature Store.
Select a Data Connector from the drop-down list.
The Table info/ metadata panel will appear on the right side of the page.
Click on a table name to select it.
An SQL query will be generated in the given place.
Click the Validate option.
A notification message ensures the user that the action has been executed successfully and the table is executed.
A preview of the table appears below.
Click the Create option.
A notification message ensures the user that the intended Feature Store is being created.
The user gets redirected to the Feature Stores page.
The newly created Feature Store gets added at the top of the list.
Please Note:
Click the Refresh icon to get the status level updates for the newly created Feature Store.
A Feature Store gets Initializing, Started, and Completed as Status.
Check out the illustration on scheduling a Feature Store.
Navigate to the Data Science Lab module.
Click the Create option provided for Feature Store.
The Create Feature Store form opens.
Provide the Featureset Name.
Select a connector using the drop-down menu.
Write or get an SQL query by selecting a table/metadata from the Tab Info./Metadata panel.
Validate the query using the Validate option.
A notification appears to ensure the user after the query is validated.
Click the Schedule option.
The Schedule page appears.
Select an option for the Concurrency Policy. The following options are provided:
Allow (Parallel): Multiple instances run simultaneously. No concurrency restrictions. Suitable for independent tasks.
Forbid (Prevent, Deny): Only one instance runs simultaneously. New instances are skipped if a previous one is running. Suitable for tasks that can't run in parallel.
Replace (Terminate, ReplaceOlder): A new instance starts, previous one is terminated. Suitable when the latest instance should take priority. Ensures no overlap.
Navigate to the Cron Generator section.
Choose the Monthly or Yearly option and provide the required information.
Based on the selection from the Cron Generator the Scheduler Time will be added.
Click the Apply option.
The user gets redirected to the Create Feature Store page, a notification ensures that the Feature Store is scheduled.
The same will be indicated through a green mark in the Scheduler option.
Click the Create option.
The user gets redirected to the Feature Stores page.
The newly created Feature Store is added at the top of the page.
A notification message ensures that the Feature store job is initialized. The same is suggested through the Status column.
Click the Refresh icon.
The feature store status gets changed to Started.
Click the Refresh icon.
The Feature Store status gets changed to Completed.
The Stop Scheduling icon gets enabled for the feature store.
Please Note: The Stop Schedule option will remain enabled when a scheduled Feature Store reaches the scheduled time limit. The user can click the Stop Schedule icon during this period to stop the schedule.
You can enable the Sync git repo at the project creation option to make your DSL Project a Git Repo Sync Project. The Repo Sync Projects will be displayed in the Project list with a branch icon in their title.
You can configure the Git access for a normal Data Science Lab project by configuring the Git Repository and Git Branch fields while creating a new project. Such projects will display the branch icon without the drop-down option while opening that project. For example,