# HDFS Reader

HDFS stands for **Hadoop Distributed File System**. It is a distributed file system designed to store and manage large data sets in a reliable, fault-tolerant, and scalable way. HDFS is a core component of the Apache Hadoop ecosystem and is used by many big data applications.

This component reads the file located in HDFS(**Hadoop Distributed File System).**

All component configurations are classified broadly into 3 section

* [​Basic Information​](https://docs.bdb.ai/data-pipeline-4/components/component-base-configuration)​
* Meta Information
* [Resource Configuration​](https://docs.bdb.ai/data-pipeline-4/components/resource-configuration)

## **Configuring the  Meta Information tab of the HDFS Reader**

* **Host IP Address:** Enter the host IP address for HDFS.
* **Port:** Enter the Port.
* **Zone:** Enter the Zone for HDFS. Zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read.
* **File Type:** Select the File Type from the drop down. The supported file types are:
  * ***CSV:*** The ***Header*** and ***Infer Schema*** fields get displayed with ***CSV*** as the selected File Type. Enable ***Header*** option to get the Header of the reading file and enable Infer Schema option to get true schema of the column in the CSV file.&#x20;
  * ***JSON***: The **Multiline** and **Charset** fields get displayed with ***JSON*** as the selected File Type. Check-in the **Multiline** option if there is any multiline string in the file.
  * ***PARQUET***: No extra field gets displayed with PARQUET as the selected File Type.
  * ***AVRO***: This File Type provides two drop-down menus.
    * ***Compression***: Select an option out of the ***Deflate*** and ***Snappy*** options.
    * ***Compression Level***: This field appears for the Deflate compression option. It provides **0** to **9** levels via a drop-down menu.
  * ***XML:*** Select this option to read XML file. If this option is selected, the following fields will get displayed:
    * **Infer schema:** Enable this option to get true schema of the column.
    * **Path:** Provide the path of the file.
    * **Root Tag:** Provide the root tag from the XML files.
    * **Row Tags:** Provide the row tags from the XML files.
    * **Join Row Tags:** Enable this option to join multiple row tags.
  * **Path:** Provide the path of the file.
  * **Partition Columns**: Provide a unique Key column name to partition data in Spark.
