Data Pipeline

1. Spark/Scala/ Kafka upgrade [All the components will run on the below versions of Scala and spark with the support of the below Kafka version]:

a. Upgraded the Spark version to 3.3.0

b. Upgraded the Scala version to 2.12.12

c. Upgraded the Kafka version to 3.1.0

2. Writers: HA (High Availability) support for:

a. MSSQL

b. Oracle

c. ClickHouse

d. PostgreSQL

e. MySQL

f. MongoDB

3. Readers: HA (High Availability) support for:

a. MSSQL

b. Oracle

c. ClickHouse

d. PostgreSQL

e. MySQL

f. MongoDB

4. Provided HA (High Availability) support for the Enrichment component.

5. Improved writer performance by including schema definition [For all spark writers].

6. ClickHouse Support:

a. Reader support

b. Writer support

c. Enrichment support

d. DB Sync support.

7. SQL [Batch & Aggregation]: The SQL Component now has an option aggregation function on complete streaming data that has been processed by the component.

8. Kafka Producer and Consumer Enhancement:

a. Introduced a component Kafka Producer to connect to external Kafka topics for different applications.

b. Kafka Consumer supports SSL Security Type with Host Aliasing.

9. Intelligent Scaling: A feature to scale your component to the max instance to reduce the data processing lag. This feature detects the need to scale up the components on higher data traffic and reduces the instances if there is no data traffic.

10. Connection Validation: Provided this option to validate all the connections before activating a pipeline. E.g., if there is an RDBMS reader, we can click on the Connection Validation icon to validate it.

11. Data Channel enhancement: Introduced more broker-level insights like no. of partitions and disk usage.

12. Introduced a Cluster event page to monitor the Kafka brokers and topics related to pipelines. Even the deleted pipeline topics will list there.

13. Spark Mongo Component: Introduced Connection String, HA Mongo option, and Spark Schema upload to reduce Write Operation.

14. Implemented required changes to align with the new version of the DS Lab module that supports Data Preparation and Auto ML options.

Check out the given Video to get an overview of the latest Data Pipeline features.

Last updated