Writers

Using the Writers Tab in Data Science Lab

The Writers tab in a Data Science Lab (DSLab) Notebook allows users to write the output of data science experiments directly to supported databases. This functionality simplifies the process of storing DataFrames or experiment results into persistent storage systems for downstream analytics or pipeline integration.

Note: Supported database writers include MySQL, MSSQL, Oracle, MongoDB, PostgreSQL, and ClickHouse.

Steps to Use the Writers Tab

Navigation path: Data Science Lab > Workspace > Writers

1. Prepare the Dataset

  1. Navigate to a code cell containing the dataset details.

  2. Run the code cell to preview the dataset.

2. Access Secrets for Database Credentials

  1. Click the Secrets tab in the right-side panel.

  2. Select the registered database secret keys relevant to your database connection.

  3. Add a new code cell to retrieve the secret values.

  4. Use the checkboxes to select the required secret keys (e.g., username, password, host, port).

3. Configure the Writers

  1. Add another code cell in the notebook.

  2. Open the Writers section from the left-side panel of Workspace.

  3. Select the driver type for your database using the provided checkboxes.

  4. The system will auto-generate the code in the new cell for writing data.

4. Provide Database Details

  1. Populate the required database parameters in the generated code cell:

    • Username

    • Password

    • Host

    • Port

    • Database Name

    • Table Name

    • DataFrame to be written

5. Execute the Writer Code

  1. Run the code cell with the modified database details.

  2. A message below the code cell will confirm that the DataFrame has been written to the selected database.

Notes

  • Only supported database types can be used as writers: MySQL, MSSQL, Oracle, MongoDB, PostgreSQL, and ClickHouse.

  • Using Secrets ensures that credentials are encrypted and not exposed in notebook code.

  • The Writers functionality integrates directly with DataFrames generated in the notebook, making data persistence seamless for downstream pipelines.