SQL Component

The SQL Component is used to apply SQL operations within a data pipeline. It enables developers and analysts to transform and manipulate data by writing SQL queries directly on Spark DataFrames. This component acts as a bridge between extracted raw data and the transformed dataset, offering flexibility in data processing and seamless integration with downstream systems.

The component also supports aggregate queries across streaming data, allowing users to perform advanced analytics such as group-level aggregations or window functions.

Key Capabilities

  • Perform transformations on Spark DataFrames using SQL.

  • Use aggregation functions on streaming datasets.

  • Flexibly select and alias columns with custom data types.

  • Support for batch and aggregate query modes.

  • Integrates seamlessly with downstream components in a pipeline.

Configuration Overview

All SQL Component configurations are organized into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

Note When using Aggregate Query mode, the schema file must be provided in Spark JSON schema format.

Configuring Meta Information

Query Type

Two query types are available:

  • Batch Query

    • Use for one-time or standard transformations.

    • No schema file upload is required.

  • Aggregate Query

    • Use when applying aggregation functions on streaming data.

    • Requires uploading a Spark schema file in JSON format for the in-event data.

Schema File Name

  • Upload the schema file when Aggregate Query is selected.

  • File format: Spark JSON schema.

Table Name

  • Provide the table name that will be used for SQL transformations.

Query

  • Enter the SQL query to be executed on the Spark DataFrame.

Selected Columns

  • Select column names from the table.

  • Optionally provide:

    • Alias name (renamed output field).

    • Data type for the column.

Data Writing

  • When configured in Aggregate Query mode and connected to DB Sync, the SQL Component does not write data to the DB Sync event.

Monitoring

  • In Aggregate Query mode, monitoring metrics for the SQL Component are not displayed on the Monitoring page.

Running Aggregate Queries Freshly

If the SQL Component is set to Aggregate Query mode and you want to execute it from a clean state, it is recommended to clear existing event data.

To achieve this:

  1. Copy the existing SQL Component.

  2. Paste the copied component to create a fresh instance.

  3. Run the newly created instance.

This ensures that the query executes without including aggregations from previous runs.