Data Science Modelling Considerations

The Data Science Lab (DSL) module of the BDB Platform offers a comprehensive environment for data scientists to develop, manage, and deploy machine learning models efficiently.

Key Capabilities

  • Data science Projects: Create and manage projects with customizable settings such as resource allocation, environment selection, and external library integration.

    • GPU support is available for deep learning models and tasks that involve large amounts of data or complex computations.

    • The user can select the environment they want to work in. Currently, the supported Python frameworks are Sklearn (by default), TensorFlow, PyTorch, and PySpark.

  • Notebook Integration: Develop and execute code using integrated Jupyter Notebooks, with options to create new notebooks or upload existing ones for seamless collaboration.

  • Model Development and Deployment: Build machine learning models within notebooks, save them for future use, and deploy them as APIs for integration into applications or workflows.

  • AutoML Capabilities: Utilize built-in AutoML features to automate model selection and hyperparameter tuning, streamlining the development process. Experiments can be created with the available options – regression, forecasting, and classification based on the data.

  • Feature Store: Helps in creating a central location where you can create point-in-time correct training data from multiple different data sources.

We can productionize these models created in the data science lab by exporting and utilizing them in data pipelines. We can use Script Runner, DSLab Runner, or AutoML Runner in the pipeline based on the requirement. Even the models can be exposed as APIs.

Last updated