# Datasets

{% hint style="success" %}
*Check out the given walk-through on how to add a dataset using the Notebook page.*
{% endhint %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FBLGYLEkBUnc8nVEBAuEI%2Fuploads%2FI59LGHyVAHpESgb0o6g0%2FAdding%20Data%20set%20Notebook%20(PySpark).mp4?alt=media&token=a455d58c-5ea6-45a4-819f-3056c6decf1f>" %}
***Adding a Dataset to Notebook***&#x20;
{% endembed %}

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FBLGYLEkBUnc8nVEBAuEI%2Fuploads%2F6JfY902CZ4ha8vXqfvde%2FUploading%20Data%20Sandbox%20(Pyspark).mp4?alt=media&token=cf9ca091-54d1-4c0f-823f-561823e78b94>" %}
***Uploading a Datastore file, Adding it to Notebook***&#x20;
{% endembed %}

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The Datasets even if added from a Notebook infrastructure, they get added at the project level, so the added Datasets are available for all the Notebooks under the the same project.*
{% endhint %}

## Adding Datasets

* Navigate to the Notebook page.
* Click the ***Datasets*** tab.

<figure><img src="/files/K3E8OehmbQAUzPWPsrgR" alt=""><figcaption></figcaption></figure>

* Click on the ***Add Dataset*** option.

<figure><img src="/files/vJ3RtZMAdVD5eKQQo2pT" alt=""><figcaption></figcaption></figure>

* The ***Add Datasets*** page appears.&#x20;
* Select ***Data Sets*** or ***Data Sandbox*** option from the Data Source drop-down menu.
* Search for a Data Set using the search bar.
* Select dataset(s) as per the requirement using the check box. The user can select multiple datasets/ data sandbox.
* Click the ***Add*** option.&#x20;

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark> The Add option gets displayed only after you select at least one dataset.*
{% endhint %}

<figure><img src="/files/pLBbk5hkGCZGQ5T71USl" alt=""><figcaption><p><em><strong>The Add Datasets page</strong></em> </p></figcaption></figure>

* A notification message appears.
* The selected Dataset(s) get added to the given ***Datasets*** tab.

<figure><img src="/files/kHGYFZ18TpqXMlhgzHtz" alt=""><figcaption></figcaption></figure>

* Click on the ***More*** icon for an added Dataset.
* The drop-down menu appears displaying the ***Preview*** and ***Data Preparation*** actions for the added dataset appears.

<figure><img src="/files/ND2u9G7UkKp9EvIOqFEH" alt=""><figcaption></figcaption></figure>

## Uploading Datasets (Data Sandbox)

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>  The **Upload** option is provided for the Sandbox files inside the Data Science Notebook.*
{% endhint %}

* Navigate to the ***Add Datasets*** panel from a Data Science Notebook.
* Select the ***Data Sandbox*** option as Data Source.
* Click the ***Upload*** option.

<figure><img src="/files/U2Vv4ukTykJAuucHW0Cs" alt=""><figcaption></figcaption></figure>

* The ***Upload Data Sandbox*** window appears.
* Provide a Sandbox Name.
* Provide Description (it is optional).
* Use the ***Choose File*** option to select a file from the system.

<figure><img src="/files/bltIslKLY7esSjVAHi65" alt=""><figcaption></figcaption></figure>

* Select a file from the system and upload.
* Once the selected file name appears next to the Choose File, click the ***Save*** option to upload the selected file.

<figure><img src="/files/6ohil1EHGdL7dUsb4NvZ" alt=""><figcaption></figcaption></figure>

* A notification message appears to inform completion of the action.
* The uploaded file lists below with a checkbox to select it.

<figure><img src="/files/yxWWpQHW4meaapLrkIPN" alt=""><figcaption></figcaption></figure>

* Select the File using the checkbox.
* Click the ***Add*** option.

<figure><img src="/files/XvgOyM7YM72CS0QMrFhI" alt=""><figcaption></figcaption></figure>

* A notification message appears.
* The uploaded Data Sandbox dataset gets added to the Notebook.

<figure><img src="/files/KQ6HBPZHxgzdr6M15R2B" alt=""><figcaption></figcaption></figure>

## ​Reading Datasets

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>* Using ***get\_data*** function datasets and data sandbox files (csv & xlsx files) can be read.
{% endhint %}

* Add a new Code cell to Notebook or access an empty Code cell.
* Select a dataset from the Datasets tab.
* The ***get\_data*** function appears in the code cell.

<figure><img src="/files/pEkmiyZBSHvtWljbz0bw" alt=""><figcaption></figcaption></figure>

* Provide the df to print the data from the selected Dataset.
* Run the cell.
* The Data preview appears below.

<figure><img src="/files/Wp9VmNM3TsKU5MdLVqnS" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
*<mark style="color:green;">Please Note:</mark>*&#x20;

* The Text files added as Datasets to a Notebook will be disabled for the data load function. Only Copy Path option will be provided for such datasets.
* *Refer the* [***Data Science Lab Quick Start Flow***](https://docs.bdb.ai/data-science-lab-4/data-science-lab-quick-start-flow) *page to get an overview of the **Data Science Lab** module in nutshell.*&#x20;
  {% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bdb.ai/data-science-lab-4/project/tabs-for-a-data-science-lab-project/tabs-for-pyspark-environment/notebook/notebook-page/notebook-operations/datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
