S3 Writer

The S3 Writer component writes data to an Amazon S3 bucket. It supports multiple file formats, save modes, and partitioning options. Authentication is managed using AWS credentials (Access Key ID and Secret Access Key).

Configuration Sections

The S3 Writer component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Meta Information Tab

Parameter
Description
Example

Bucket Name

Name of the S3 bucket where data will be written.

my-data-bucket

Access Key

AWS Access Key ID.

AKIA...

Secret Key

AWS Secret Access Key.

********

Table

S3 object path or logical table name where data is written.

sales_data

Region

AWS region where the bucket is located.

us-east-1

File Type

Output file format. Supported: CSV, JSON, PARQUET, AVRO, ORC.

PARQUET

Save Mode

Defines write behavior. Options: Append, Overwrite.

Append

Schema File Name

Upload a Spark schema file (JSON format) for the data.

schema.json

Column Filter

Select columns to write. Specify source name, alias, and data type.

See Column Filtering.

Partition Columns

Columns used to partition data in the S3 bucket.

date, region

Save Mode Options

  • Append: Adds new data to existing files in the bucket.

  • Overwrite: Replaces existing files in the bucket with new data.

Column Filtering

The Column Filter section allows selecting specific columns to write to S3.

Field
Description
Example

Name

Column name from upstream data.

customer_id

Alias

Alias name for the column.

cust_id

Column Type

Data type of the column.

STRING

Additional Options:

  • Upload: Upload CSV/JSON/Excel to auto-populate column names.

  • Download Data: Export schema mapping in JSON format.

  • Delete Data: Clear all column filter mappings.

Partitioning

Partitioning creates separate folders in the S3 bucket for each unique value of the partition column(s). This improves query performance and data organization.

Example: Partition by a date column.

s3://my-bucket/sales_data/date=2023-01-01/
s3://my-bucket/sales_data/date=2023-01-02/
s3://my-bucket/sales_data/date=2023-01-03/

You can add multiple partition columns by clicking Add Column Name.

Example Configurations

Example 1: Writing Data in CSV Format

Bucket Name: my-data-bucket
Access Key: AKIA...
Secret Key: ********
Table: transactions
Region: us-east-1
File Type: CSV
Save Mode: Overwrite
Column Filter:
  - Name: customer_id
    Alias: cust_id
    Column Type: STRING
Partition Columns: date

Example 2: Writing Data in Parquet with Append Mode

Bucket Name: analytics-bucket
Access Key: AKIA...
Secret Key: ********
Table: employee_data
Region: us-west-2
File Type: PARQUET
Save Mode: Append
Partition Columns: department, year

This configuration appends data into S3, partitioned by department and year.

Notes

  • Ensure the provided IAM user has write permissions (s3:PutObject) for the target bucket.

  • Use partitioning for large datasets to improve downstream query performance.

  • For production workloads, prefer Parquet or ORC due to better compression and query performance.