Sandbox Reader
The Sandbox Reader component is used to read and access data files stored in a configured sandbox environment. It supports multiple file formats, column filtering, partitioning, and Spark SQL queries for flexible data exploration.
Note: Before using the Sandbox Reader, upload a file to Data Sandbox under the Data Center module.
Configuration Sections
The Sandbox Reader component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Using Sandbox Reader
Navigate to the Data Pipeline Editor.
Expand the Readers section in the component palette.
Drag and drop the Sandbox Reader into the workflow editor.
Click the component to open its configuration tabs.
Basic Information Tab
Invocation Type
Select how the reader runs: Real-Time
or Batch
.
Batch
Yes
Deployment Type
Deployment type (pre-selected).
Kubernetes
No
Container Image Version
Docker container image version (pre-selected).
v1.2.3
No
Failover Event
Select a failover event.
retry_event
Optional
Batch Size
Maximum records processed per cycle (minimum = 10).
1000
Yes
Meta Information Tab
The Meta Information tab contains parameters that vary depending on the Storage Type and File Type selected.
Storage Type Options
Network (default): Reads files using folder paths.
Platform: Reads files directly from sandbox-managed datasets.
Network Mode
File Type
File type to read (CSV
, JSON
, PARQUET
, AVRO
, XML
, ORC
).
CSV
Schema
Spark schema in JSON format (optional).
employee_schema.json
Sandbox Folder Path
Folder path containing the part files.
/sandbox/sales/
Limit
Maximum number of records to read.
5000
Platform Mode
File Type
File type to read.
PARQUET
Sandbox Name
Sandbox name for the selected file type.
employee_data
Sandbox File
File name (auto-filled after selecting sandbox).
employee_2025.parquet
Limit
Maximum number of records to read.
1000
Query
Spark SQL query (inputDf
as table name).
SELECT * FROM inputDf
Column Filtering
You can select specific columns instead of retrieving the full dataset.
Source Field
Column name in the file.
employee_id
Destination Field
Alias for the column.
emp_id
Column Type
Data type of the column.
STRING
Additional Options:
Upload File: Upload CSV/JSON files (<2 MB) to auto-populate schema.
Download Data: Export schema structure in JSON format.
Delete Data: Clear all column filter details.
Partition Columns
Enter the name of a partitioned column to read data from specific partitions.
File Type-Specific Behavior
CSV
Header: Enable if the first row contains column names.
Infer Schema: Automatically detect schema.
Multiline: Enable for multiline CSV records.
Schema: Provide Spark schema (JSON) to filter bad records.
To handle bad records, map the failover event in the Basic Information tab.
JSON
Multiline: Enable for multiline JSON.
Charset: Specify encoding (
UTF-8
,ISO-8859-1
, etc.).
PARQUET
No extra fields required.
AVRO
Compression: Select
Deflate
orSnappy
(default).Compression Level: Available when
Deflate
is selected (0–9).
XML
Infer Schema: Enable to detect column schema.
Path: File path.
Root Tag: Root element in XML.
Row Tags: Row-level tag(s).
Join Row Tags: Enable to join multiple row tags.
ORC
Push Down: Predicate pushdown option:
True
: Enables pushdown, improving query performance by filtering at storage level.False
: Filtering happens after load (slower).
Example Query
SELECT
team AS Team,
AVG(monthly_salary) AS Average_Salary,
AVG(experience) AS Average_Experience
FROM inputDf
GROUP BY team;
Notes
Fields marked with
(*)
are mandatory.Either table or query must be specified (except for SFTP Reader).
Ensure no data type mismatch in Column Filter definitions.
Available fields vary by File Type selection.