Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Page
          • Job Editor Page
          • Spark Job
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
              • Athena Query Executer
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
          • Python Job (On demand)
          • Script Executer Job
          • Job Alerts
        • Register as Job
        • Exporting a Script From Data Science Lab
        • Utility
        • Git Sync
      • Overview
        • Jobs
        • Pipeline
      • List Jobs
      • List Pipelines
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push & Pull Pipeline
        • Update Pipeline Components
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
        • Format Flowchart
        • Zoom In/Zoom Out
        • Update Component Version
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Pipeline Monitoring
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
      • Athena Query Executer
      • Big Query Reader
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
      • PySpark Script
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
    • Job Trigger
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
Export as PDF
  1. Components
  2. Transformations

SQL Component

SQL transformer applies SQL operations to transform and manipulate data, providing flexibility and expressiveness in data transformations within a data pipeline.

PreviousTransformationsNextFile Splitter

The SQL component serves as a bridge between the extracted data and the desired transformed data, leveraging the power of SQL queries and database systems to enable efficient data processing and manipulation.

It also provides an option of using aggregation functions on the complete streaming data processed by the component. The user can use SQL transformations on Spark data frames with the help of this component.

All component configurations are classified broadly into the following sections:

  • ​​

  • Meta Information

  • ​​

Follow the given steps in the demonstration to configure the SQL transformation component.

Please Note: The schema file that can be uploaded here is a JSON spark schema.

Configuring Meta Information of SQL Component.

  1. Query Type: There are two options available under this field:

    • Batch Query: When this option is selected, then there is no need to upload a schema file.

    • Aggregate Query: When this option is selected, it is mandatory to upload the spark schema file in JSON format of the in-event data.

  2. Schema File name: Upload the spark schema file in JSON format when the Aggregate query is selected in the query type field.

  3. Table name: Provide the table name

  4. Query: Write an SQL query in this field.

  5. Selected Columns: select the column name from the table, and provide the Alias name and the desired data type for that column.

Please Note: When Usging Aggregation Mode

Data Writing:

  • When configured for Aggregate Query mode and connected to DB Sync, the SQL component will not write data to the DB Sync event.

Monitoring:

  • In Aggregate mode, monitoring details for the SQL component will not be available on the monitoring page.

Running Aggregate Queries Freshly:

  • If you set the SQL component to Aggregate Query mode and want to run it afresh, clearing the existing event data is recommended. To achieve this:

    1. Copy the component.

    2. Paste the copied component to create a fresh instance.

  • Running the copied component ensures the query runs without including aggregations from previous runs.

​Basic Information​
Resource Configuration​
Meta information of SQL component.
Configuring SQL component