Hadoop Distributed File System

Hadoop is an echo system that is used to process big data that can not be handled using traditional processing tools. Hadoop store data in distributed file system that is really different from the windows file system let us know more about its file system.

A file system is a management of files and directories, the way any machine store and manage data. every file system keeps metadata about its files and directories like when we move any file into any file system then it will store in disk memory so our file system will create a meta log about the location of this file.

Now imagine if your file size is as much as bigger that you can not store it into the disk then you will arrange more then 2 or 3 machines and you will divide your data into 3 separate machines then you have to manage metadata of each divided partitions.

A distributed file system is all about big data partitioning over multiple machines and proper metadata management of all of these partitions that are stored at the Hadoop cluster has a couple of nodes.

Inside the Hadoop cluster, we keep all partitions metadata on a separate machine called namenode so whenever any file is moving inside the Hadoop cluster then Hadoop automatically divides your file into multiple partitions and loading your partitions at multiple machines and managing each partition metadata.

In Hadoop distributed file system one memory block size is 128 Mb so if you load 500 Mb data that it will automatically divide your data into 128 Mb partitions.

partition1 size 128 Mb
partition2 size 128 Mb
partition3 size 128 Mb
partition4 size 116 Mb

So finally we can say HDFS is the management of files and directories that are distributed at the Hadoop cluster.

Leave a Reply

Your email address will not be published. Required fields are marked *