Pandas Query Component

Write basic python pandas query snippets for transformations.

All component configurations are classified broadly into 3 section

Please follow the demonstration to configure the component.

Pandas Query Component

This component helps the users to get data as per the entered query.

Steps to configure the component:

i) Drag and Drop the Pandas Query component to the Workflow Editor.

ii) The transformation component requires an input event (to get the data) and sends the data to an output event.

iii) Create two Events and drag them to the Workspace.

iv) Connect the input event and the output event to the component (The data in the input event can come from any Ingestion, Reader, or shared events).

v) Click the Pandas Query component to get the component properties tabs.

vi) The Basic Information tab opens by default.

a. Select an Invocation type from the drop-down menu to confirm the running mode of the Pandas Query component. Select ‘Real-Time’ or ‘Batch’ from the drop-down menu.

b.Deployment Type: It displays the deployment type for the component. This field comes pre-selected.

c.Container Image Version: It displays the image version for the docker container. This field comes pre-selected.

d.Failover Event: Select a failover Event from the drop-down menu.

e.Batch Size (min 10): Provide the maximum number of records to be processed in one execution cycle (Min limit for this field is 10).

vii) Open the ‘Meta Information’ tab and provide the connection-specific details.

a. Enter a valid data query to fetch data.

b. Provide the Table Name.

Note: The table name and query DF should be the same.

viii) Click the ‘Save Component in Storage’ icon to save the component properties.

ix) A Notification message appears to notify the successful update of the component.

Note: Pandas Query Example

SQL Query

Pandas Query

select id from airports where ident = 'KLAX'

airports [airports.ident == 'KLAX'].id

select * from airport_freq where airport_ident = 'KLAX' order by type

airports[(airports.iso_region == 'US-CA') & (airports.type == 'seaplane_base')]

select type, count(*) from airports where iso_country = 'US' group by type having count(*) > 1000 order by count(*) desc

airports[airports.iso_country == 'US'].groupby('type').filter(lambda g: len(g) > 1000).groupby('type').size().sort_values(ascending=False)

Last updated