The BDB Data Science Lab serves as a collaborative hub for data scientists to work together. Within this module, they can collectively conduct experiments, and exchange Notebooks, models, and other important elements with their team. This collaborative environment allows for validation and seamless deployment of these resources to the Production environment.
A Data Science Project created inside the Data Science Lab is like a Workspace inside which the user can create and store multiple data science experiments and their associated artifacts.
Please Note: The user can create a new Notebook for coding or upload an existing Notebook only after Activating a Data Science Project.
A Feature Store is a centralized repository for storing, managing, and serving features used in machine learning models. It plays a crucial role in the machine learning lifecycle by providing a consistent and efficient way to manage features. It is a scalable solution for organizing and cataloging features, making them easily accessible to data scientists and ML engineers across an organization.
Feature Stores facilitate collaboration, version control, and reusability of features, streamlining the ML development process and improving model quality and efficiency.
A Workspace in a Data Science module provides a cohesive and integrated environment that supports the end-to-end data science workflow, from data ingestion and processing to analysis, model building, and deployment.
The Workspace is a placeholder to create and save various data science experiments inside the Data Science Lab module.
The Workspace is the default tab to open for each Data Science Lab project.
A Data Science Notebook is an interactive and collaborative digital platform used by data scientists and analysts for data exploration, analysis, modeling, and visualization. It combines executable code, visualizations, and explanatory text in a flexible and shareable format, making it a versatile tool for data science projects. Key features include code execution, rich text and visualizations, interactive data exploration, collaboration and sharing, reproducibility and documentation, and integration with data science libraries and tools.
In the current Data Science Lab module a .ipynb file that is created or imported inside a project works like a Data Science Notebook for the users. The Workspace tab of a Data Science Project contains such Data Science Notebooks in a Repo folder.
A dataset in data science is a structured collection of data used for analysis and modeling. It represents a specific domain or problem and can include various data types. Datasets are essential for tasks like data analysis, modeling, and extracting insights in both supervised and unsupervised learning. They can be sourced from different domains and collected from surveys, experiments, or existing databases. Datasets contain features and labels for supervised learning, while they are unlabeled for unsupervised learning. They are typically split into training and test sets. Publicly available datasets are widely used for research and benchmarking. Datasets form the foundation for various data science tasks and enable solving complex problems.
The Data Set tab provided under the Data Science Lab module supports the following types of Data sets:
Dataset - Here, Dataset stands for a table or filtered data from database.
Data Sandbox - Data Sandbox are files that are uploaded or appended to Data Sandbox folder from local directory (excel, csv, text etc.).
A data science model refers to a mathematical or computational representation of a real-world phenomenon or problem that data scientists use to make predictions, gain insights, or automate decision-making processes. It is a key component of the data science workflow and is built using data, statistical techniques, and algorithms.
The Model tab under a Data Science Project includes:
Imported Models: Models trained using external tools and libraries, which are brought into the data science workflow for analysis or prediction tasks.
Models created in Data Science Lab Notebook: Models built and trained within the Data Science Lab Notebook environment, utilizing its features and capabilities.
AutoML Models: Models generated through automated machine learning (AutoML) techniques, which automatically search and select the best model based on the given data and desired outcome.
The Utility tab allows to create and list the python scripts (.py files) that can be imported to your notebook.
AutoML (Automated Machine Learning) refers to the automated process of building and optimizing machine learning models without extensive manual intervention. It leverages intelligent algorithms and techniques to automate tasks such as data preprocessing, feature selection, model selection, hyperparameter tuning, and model evaluation. AutoML aims to simplify and accelerate the model development process, enabling users with limited machine learning expertise to create effective models efficiently.
The Auto ML tab allows the users to create data science experiments and lists them.