Sqoop Executer

Sqoop Executer is a tool designed to efficiently transfer data between Hadoop (Hive/HDFS) and structured data stores such as relational databases (e.g., MySQL, Oracle, SQL Server).

All component configurations are classified broadly into the following sections:

Basic Information Tab

Basic information tab

It is the default tab to open for the component while configuring it.

  • Invocation Type: Select an invocation mode out of ‘Real-Time’ or ‘Batch’ using the drop-down menu.

  • Deployment Type: It displays the deployment type for the reader component. This field comes pre-selected.

  • Container Image Version: It displays the image version for the docker container. This field comes pre-selected.

  • Failover Event: Select a failover Event from the drop-down menu.

  • Batch Size (min 10): Provide the maximum number of records to be processed in one execution cycle (Min limit for this field is 10).

Steps to configure the meta information of Sqoop Executer

Meta information tab of Sqoop Executer
  • Username: Enter the username for connecting to a relational database.

  • Host: Provide a host or IP address of the machine where your relational database server is running.

  • Port: Provide a Port number (the default number for these fields is 22).

  • Authentication: Select an authentication type from the drop-down:

    • Password: Enter the password.

    • PEM/PPK File: choose a file and provide the file name if the user selects this authentication option.

  • Command: Enter the relevant Sqoop command. In Apache Sqoop, a command is a specific action or operation that you perform using the Sqoop tool. Sqoop provides a set of commands to facilitate the transfer of data between Hadoop (or more generally, a Hadoop ecosystem component) and a relational database. These commands are used in Sqoop command-line operations to interact with databases, import data, export data, and perform various data transfer tasks.

Some of the common Sqoop commands include:

  • Import command: This command is used to import data from a relational database into Hadoop. You can specify source and target tables, database connection details, and various import options.

sqoop import --connect jdbc:mysql://hostname/database_name --username your_username --password your_password --table your_table --target-dir /user/hadoop/sqoop_data
  • Export Command: This command is used to export data from Hadoop to a relational database. You can specify source and target tables, database connection details, and export options.

sqoop export --connect jdbc:mysql://hostname/database_name --username your_username --password your_password --table your_table --export-dir /user/hadoop/sqoop_data
  • Eval Command: This command allows you to evaluate SQL queries and expressions without importing or exporting data. It's useful for testing SQL queries before running import/export commands.

sqoop eval --connect jdbc:mysql://hostname/database_name --username your_username --password your_password --query "SELECT * FROM your_table"
  • List Databases Command: This command lists the available databases on the source database server.

sqoop list-databases --connect jdbc:mysql://hostname --username your_username --password your_password

Last updated