Spark

This page covers configuration details for the Mongo DB Reader component.

All component configurations are classified broadly into the following sections:

Please follow the demonstration to configure the component.

MongoDB Reader reads the data from the specified collection of Mongo Database.It has an option to filter data using spark SQL query

Steps to configure the component in the pipeline:

i) Drag & Drop the Mongo DB Reader on the Workflow Editor.

ii)Click on the dragged reader component to open the component properties tabs below.

iii) Basic Information: It is the default tab to open for the Mongo DB reader while configuring the component.

a. Select an Invocation type from the drop-down menu to confirm the running mode of the reader component. Select ‘Real-Time’ or ‘Batch’ from the drop-down menu.

b. Deployment Type: It displays the deployment type for the component. This field comes pre-selected.

c. Container Image Version: It displays the image version for the docker container. This field comes pre-selected.

d. Failover Event: Select a failover Event from the drop-down menu.

e. Batch Size: Provide the maximum number of records to be processed in one execution cycle.

iv) Open the ‘Meta Information’ tab and fill in all the connection-specific details of Mogo DB.

1. Connection Type: Select either of the connection type choices out of ‘Standard’, ‘SRV’, and 'Connection String'.

2. Host IP Address (*): Hadoop IP address of the host.

3. Port (*): Port number (It appears only with the ‘Standard’ Connection Type).

4. Username (*): Provide username.

5. Password (*): Provide a valid password to access the Mongo DB.

6. Database Name (*): Provide the name of the database from where you wish to read data.

7. Collection Name (*): Provide the name of the collection.

8. Query: Insert an SQL query (it takes a query containing a Join statement as well).

9. Limit: Set a limit for the number of records.

10. Additional Parameters: Provide the additional parameters.

The Meta Information fields vary based on the selected ‘Connection Type’ option.

i.Meta Information Tab with Standard as Connection Type

ii. Meta Information Tab with SRV as Connection Type

viii) Selected Columns: The users can select some specific columns from the table to read data instead of selecting a complete table; this can be achieved via the ‘Selected Columns’ section. Select the columns which you want to read and if you want to change the name of the column, then put that name in the alias name section otherwise keep the alias name the same as of column name and then select a Column Type from the drop-down menu.

Use ‘Download Data’ and ‘Upload File’ options to select the desired columns.

1. Upload File: The user can upload the existing system files (CSV, JSON) using the ‘Upload File’ icon (file size must be less than 2 MB).

2. Download Data (Schema): Users can download the schema structure in JSON format by using the ‘Download Data’ icon.

ix) After doing all the configurations click the ‘Save Component in Storage’ icon provided in the reader configuration panel to save the component.

x) A notification message appears to inform about the component configuration success.

PreviousDocker Reader NextAzure Blob Reader

Last updated 3 years ago

hashtagSteps to configure the component in the pipeline:

Steps to configure the component in the pipeline: