Operations

This section describes the various operations available within a Data Science Project.

Note: The available operations may vary depending on the project environment in which the project is created.

  • Projects created under the PySpark environment support only a subset of operations:

    • Data, Secrets, Variable Explorer, and Writers.

  • Projects created under Python TensorFlow or Python PyTorch environments support a broader set of operations, providing advanced functionality for machine learning workflows.

Operations in a Data Science Notebook

1. Data

  • Add data to the notebook for analysis and modeling.

  • View a list of all datasets added to the notebook.

  • Supports multiple file types and sources.

2. Secrets

  • Create and manage environment variables to store confidential information securely.

  • Prevent sensitive data such as API keys, passwords, or tokens from being exposed in notebook code.

3. Algorithms

  • Access algorithm settings at the project level.

  • Configure and use machine learning algorithms directly inside the notebook.

  • Supports project-level sharing of algorithm configurations across notebooks.

4. Transforms

  • Save and load models using transform scripts.

  • Register models or publish them as APIs through the DS Lab module.

  • Supports reproducible data transformations and workflow integration.

5. Models

  • Train, save, and load models using frameworks such as Scikit-learn, TensorFlow/Keras, and PyTorch.

  • Register models for use in pipelines or shared environments.

  • For detailed instructions, refer to Model Creation using Data Science Notebook.

6. Artifacts

  • Save plots, visualizations, and datasets as artifacts inside the notebook.

  • Artifacts provide a way to store and reuse results generated during experiments.

7. Variable Explorer

  • Inspect detailed information about variables declared in the notebook.

  • Monitor variable types, values, and memory usage for debugging and analysis.

8. Writers

  • Write experiment outputs to supported database writers.

  • Supports batch and incremental writes for integration with downstream pipelines or analytics systems.


Notes

  • The available operations are context-sensitive, depending on the environment in which the notebook is created.

  • Operations like Transforms, Models, and Artifacts are available only in TensorFlow or PyTorch environments.

  • PySpark notebooks are limited to Data, Secrets, Variable Explorer, and Writers operations.