S3 Writer
The S3 Writer component writes data to an Amazon S3 bucket. It supports multiple file formats, save modes, and partitioning options. Authentication is managed using AWS credentials (Access Key ID and Secret Access Key).
Configuration Sections
The S3 Writer component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Connection Validation
Meta Information Tab
Bucket Name
Name of the S3 bucket where data will be written.
my-data-bucket
Access Key
AWS Access Key ID.
AKIA...
Secret Key
AWS Secret Access Key.
********
Table
S3 object path or logical table name where data is written.
sales_data
Region
AWS region where the bucket is located.
us-east-1
File Type
Output file format. Supported: CSV
, JSON
, PARQUET
, AVRO
, ORC
.
PARQUET
Save Mode
Defines write behavior. Options: Append
, Overwrite
.
Append
Schema File Name
Upload a Spark schema file (JSON format) for the data.
schema.json
Column Filter
Select columns to write. Specify source name, alias, and data type.
See Column Filtering.
Partition Columns
Columns used to partition data in the S3 bucket.
date
, region
Save Mode Options
Append: Adds new data to existing files in the bucket.
Overwrite: Replaces existing files in the bucket with new data.
Column Filtering
The Column Filter section allows selecting specific columns to write to S3.
Name
Column name from upstream data.
customer_id
Alias
Alias name for the column.
cust_id
Column Type
Data type of the column.
STRING
Additional Options:
Upload: Upload CSV/JSON/Excel to auto-populate column names.
Download Data: Export schema mapping in JSON format.
Delete Data: Clear all column filter mappings.
Partitioning
Partitioning creates separate folders in the S3 bucket for each unique value of the partition column(s). This improves query performance and data organization.
Example: Partition by a date
column.
s3://my-bucket/sales_data/date=2023-01-01/
s3://my-bucket/sales_data/date=2023-01-02/
s3://my-bucket/sales_data/date=2023-01-03/
You can add multiple partition columns by clicking Add Column Name.
Example Configurations
Example 1: Writing Data in CSV Format
Bucket Name: my-data-bucket
Access Key: AKIA...
Secret Key: ********
Table: transactions
Region: us-east-1
File Type: CSV
Save Mode: Overwrite
Column Filter:
- Name: customer_id
Alias: cust_id
Column Type: STRING
Partition Columns: date
Example 2: Writing Data in Parquet with Append Mode
Bucket Name: analytics-bucket
Access Key: AKIA...
Secret Key: ********
Table: employee_data
Region: us-west-2
File Type: PARQUET
Save Mode: Append
Partition Columns: department, year
This configuration appends data into S3, partitioned by department and year.
Notes
Ensure the provided IAM user has write permissions (
s3:PutObject
) for the target bucket.Use partitioning for large datasets to improve downstream query performance.
For production workloads, prefer Parquet or ORC due to better compression and query performance.