Sqoop Executer
The Sqoop Executor is a pipeline component designed to efficiently transfer data between Hadoop (Hive/HDFS) and structured data stores such as relational databases (e.g., MySQL, Oracle, SQL Server).
It leverages Apache Sqoop commands to support:
Importing data from relational databases into Hadoop.
Exporting data from Hadoop to relational databases.
Running SQL queries directly on source systems.
Exploring available databases and tables for integration tasks.
Configuration Sections
All configurations for the Sqoop Executor are classified into:
Basic Information
Meta Information
Resource Configuration
Basic Information Tab
The Basic Information tab is the default view when configuring the component. It defines general execution and deployment properties.
Invocation Type
Select the execution mode: Batch or Real-Time.
Yes
Deployment Type
Displays the deployment type of the component (pre-selected).
Yes
Container Image Version
Displays the Docker image version for the component (pre-selected).
Yes
Failover Event
Select an event to handle failure scenarios.
Optional
Batch Size
Maximum number of records processed in one execution cycle (minimum: 10).
Yes
Meta Information Tab
The Meta Information tab defines database connection properties and Sqoop command execution details.
Username
Username for connecting to the relational database.
Yes
Host
Hostname or IP address of the database server.
Yes
Port
Database port number (default: 22).
Yes
Authentication
Select authentication type: Password or PEM/PPK File.
Yes
Password / PEM/PPK File
Depending on authentication type: provide password or upload PEM/PPK key file.
Conditional
Command
Enter the Sqoop command (import, export, eval, list-databases, etc.).
Yes
Common Sqoop Commands
Import Data
Import a relational database table into Hadoop (HDFS).
sqoop import \
--connect jdbc:mysql://hostname/database_name \
--username your_username \
--password your_password \
--table your_table \
--target-dir /user/hadoop/sqoop_data
Export Data
Export data from Hadoop into a relational database.
sqoop export \
--connect jdbc:mysql://hostname/database_name \
--username your_username \
--password your_password \
--table your_table \
--export-dir /user/hadoop/sqoop_data
Evaluate Queries
Test SQL queries on the database without performing import/export.
sqoop eval \
--connect jdbc:mysql://hostname/database_name \
--username your_username \
--password your_password \
--query "SELECT * FROM your_table"
List Databases
List all available databases on the source server.
sqoop list-databases \
--connect jdbc:mysql://hostname \
--username your_username \
--password your_password
Saving the Sqoop Executor
Click Save Component (Storage icon).
A success notification confirms the configuration has been saved.
Example Workflow
Configure Sqoop Executor with import command to bring transactional data from MySQL into Hadoop HDFS.
Process and transform the ingested data using Spark or Hive.
Configure another Sqoop Executor with export command to push the processed results back into MySQL for downstream analytics.