Sqoop Executer

The Sqoop Executor is a pipeline component designed to efficiently transfer data between Hadoop (Hive/HDFS) and structured data stores such as relational databases (e.g., MySQL, Oracle, SQL Server).

It leverages Apache Sqoop commands to support:

  • Importing data from relational databases into Hadoop.

  • Exporting data from Hadoop to relational databases.

  • Running SQL queries directly on source systems.

  • Exploring available databases and tables for integration tasks.

Configuration Sections

All configurations for the Sqoop Executor are classified into:

  • Basic Information

  • Meta Information

  • Resource Configuration

Basic Information Tab

The Basic Information tab is the default view when configuring the component. It defines general execution and deployment properties.

Field
Description
Required

Invocation Type

Select the execution mode: Batch or Real-Time.

Yes

Deployment Type

Displays the deployment type of the component (pre-selected).

Yes

Container Image Version

Displays the Docker image version for the component (pre-selected).

Yes

Failover Event

Select an event to handle failure scenarios.

Optional

Batch Size

Maximum number of records processed in one execution cycle (minimum: 10).

Yes

Meta Information Tab

The Meta Information tab defines database connection properties and Sqoop command execution details.

Field
Description
Required

Username

Username for connecting to the relational database.

Yes

Host

Hostname or IP address of the database server.

Yes

Port

Database port number (default: 22).

Yes

Authentication

Select authentication type: Password or PEM/PPK File.

Yes

Password / PEM/PPK File

Depending on authentication type: provide password or upload PEM/PPK key file.

Conditional

Command

Enter the Sqoop command (import, export, eval, list-databases, etc.).

Yes

Common Sqoop Commands

Import Data

Import a relational database table into Hadoop (HDFS).

sqoop import \
  --connect jdbc:mysql://hostname/database_name \
  --username your_username \
  --password your_password \
  --table your_table \
  --target-dir /user/hadoop/sqoop_data

Export Data

Export data from Hadoop into a relational database.

sqoop export \
  --connect jdbc:mysql://hostname/database_name \
  --username your_username \
  --password your_password \
  --table your_table \
  --export-dir /user/hadoop/sqoop_data

Evaluate Queries

Test SQL queries on the database without performing import/export.

sqoop eval \
  --connect jdbc:mysql://hostname/database_name \
  --username your_username \
  --password your_password \
  --query "SELECT * FROM your_table"

List Databases

List all available databases on the source server.

sqoop list-databases \
  --connect jdbc:mysql://hostname \
  --username your_username \
  --password your_password

Saving the Sqoop Executor

  • Click Save Component (Storage icon).

  • A success notification confirms the configuration has been saved.

Example Workflow

  1. Configure Sqoop Executor with import command to bring transactional data from MySQL into Hadoop HDFS.

  2. Process and transform the ingested data using Spark or Hive.

  3. Configure another Sqoop Executor with export command to push the processed results back into MySQL for downstream analytics.