SQL Component
The SQL Component is used to apply SQL operations within a data pipeline. It enables developers and analysts to transform and manipulate data by writing SQL queries directly on Spark DataFrames. This component acts as a bridge between extracted raw data and the transformed dataset, offering flexibility in data processing and seamless integration with downstream systems.
The component also supports aggregate queries across streaming data, allowing users to perform advanced analytics such as group-level aggregations or window functions.
Key Capabilities
Perform transformations on Spark DataFrames using SQL.
Use aggregation functions on streaming datasets.
Flexibly select and alias columns with custom data types.
Support for batch and aggregate query modes.
Integrates seamlessly with downstream components in a pipeline.
Configuration Overview
All SQL Component configurations are organized into the following sections:
Basic Information
Meta Information
Resource Configuration
Note When using Aggregate Query mode, the schema file must be provided in Spark JSON schema format.
Configuring Meta Information
Query Type
Two query types are available:
Batch Query
Use for one-time or standard transformations.
No schema file upload is required.
Aggregate Query
Use when applying aggregation functions on streaming data.
Requires uploading a Spark schema file in JSON format for the in-event data.
Schema File Name
Upload the schema file when Aggregate Query is selected.
File format: Spark JSON schema.
Table Name
Provide the table name that will be used for SQL transformations.
Query
Enter the SQL query to be executed on the Spark DataFrame.
Selected Columns
Select column names from the table.
Optionally provide:
Alias name (renamed output field).
Data type for the column.
Data Writing
When configured in Aggregate Query mode and connected to DB Sync, the SQL Component does not write data to the DB Sync event.
Monitoring
In Aggregate Query mode, monitoring metrics for the SQL Component are not displayed on the Monitoring page.
Running Aggregate Queries Freshly
If the SQL Component is set to Aggregate Query mode and you want to execute it from a clean state, it is recommended to clear existing event data.
To achieve this:
Copy the existing SQL Component.
Paste the copied component to create a fresh instance.
Run the newly created instance.
This ensures that the query executes without including aggregations from previous runs.