How to Provide Elasticity to the DataNode Storage by Integrating through the LVM and Hadoop💥

Lalita Sharma
6 min readNov 13, 2020

→ Increase or Decrease the size of partition in Linux ←

Hadoop + LVM = Storage Increased Elastically

Hello Enthusiasts….!!😃

🌟In this article, We are going to discuss about how we can integrate LVM with Hadoop in order to provide the elasticity to the DataNode Storage. Here, Elasticity means whenever we have a need to change the storage provided by DataNode to the Hadoop-Cluster , we can achieve it, without shutting down or restarting or stopping the service of DataNode , we can directly increase or decrease the size of DataNode by using some commands. It leads to less consumption of time and provide high speed.

🌀Task-Description:-

🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage.

🌀Pre-requisite:-

→ Basic knowledge of RHEL-8 Linux , Partition concepts and LVM(Logical Volume Management)⚡

→ Knowledge of Hadoop-Cluster and how to set-up hadoop⚡

♠️ Logical Volume Management(LVM):-

LVM is a tool which includes allocating disks, striping, mirroring and resizing of logical volume. In this, a hard drive (or disk) is firstly allocated to the physical volume then, this physical volume can be placed on other block devices i.e. volume group which might span one or more drives. At last, physical volume are combined to the logical volumes.

🌟Now, let’s perform the task —

1.)Set-up the Hadoop-Cluster and Configure the NameNode and DataNode

2.)Start the NameNode service

👉 # hadoop-daemon.sh start namenode

👉 #jps

👉 #hadoop dfsadmin -report //Check datanodes available yet 0 connected

3.) Add Hard Disk to the DataNode

New Hard Disk Added To The DataNode

⚡Here, I have added one new hard disk to the DataNode because we will share storage from this HD to the hadoop-cluster. I have taken the size as 20GiB of the hard disk.

4.) Check the Hard Disk storage

We can check the hard disk storage by using command:-

👉 # fdisk -l

/dev/sdb: 20GiB

5.)Create the Physical Volume from that hard disk (/dev/sdb)

To create physical volume , use command:-

👉 # pvcreate /dev/sdb

To display the physical volume created above, use command:-

👉 # pvdisplay /dev/sdb

Physical Volume created successfully

⚡By the above picture, we can see that physical volume(PV) created of size 20GiB (Size of new hard disk) and PV is not allocatable. Now, We have to allocate this Physical Volume to some Volume Group.

6.)Create the Volume Group(VG)

To Create the Volume Group , use command:-

👉 # vgcreate vgdn/dev/sdb

Here, I have given the name ‘vgdn’ to the volume group created above.

To Display the Volume Group, use command:-

👉 # vgdisplay vgdn

Volume Group Created

To Check the physical volume is allocated or not , use command:-

pvdisplay /dev/sdb

⚡Thus, We can see that Allocatable is Yes. So, the physical volume is allocated to ‘vgdn’.

7.)Create Logical Volume of size 9GiB

⚡Here, I am going to create one logical volume with the name ‘lvdn’ of size 10GiB and then, Mount it with the folder linked with the DataNode.

To Create the Logical Volume , use command:-

👉 # lvcreate — size 10G — name lvdn vgdn

To Display the Logical Volume , use command:-

👉 # lvdisplay vgdn/lvdn

Created Logical volume of size 10GiB
Displayed logical volume

⚡So, You can see that , we have successfully created one logical volume of size 10GiB.

8.)Format the Logical Volume

⚡In this step, we will format the logical volume created above so that we can use it to store our data or it is used as datanode storage associated with the hadoop-cluster.

To Format the logical volume, use command:-

👉 #mkfs.ext4 /dev/vgdn/lvdn

Formatted LV

9.)Mount the Logical Volume with the DataNode Directory(or folder)

To Mount the logical volume, use command:-

👉 #mount /dev/vgdn/lvdn /dn

Check the logical volume mounted to ‘/dn’ , by using command:-

👉 # df -h

Mounted successfully

10.)Start the DataNode Service

To Start the Service, use command:-

👉 # hadoop-daemon.sh start datanode

To Check that DataNode connected to the Hadoop-Cluster or not, Use report command:-

👉 # hadoop dfsadmin -report

DataNode

⚡So, We can clearly see that 9.78GiB(¬10GiB) storage has been shared by DataNode to the Hadoop-Cluster.

✳️Now, We have to Elastically increase the storage without shutting/stopping down the datanode .

11.)Increase the Logical Volume Size

⚡Here, we are going to increase the storage or logical volume size by 9GiB.

To Extend(or increase) the Storage , Use command:-

👉 #lvextend — size +9G /dev/vgdn/lvdn

To Display the Incremented logical volume size, use command:-

👉 # lvdisplay /dev/vgdn/lvdn

LV size — Increased by 9GiB

⚡Thus, The Size of Logical Volume has been extended from 10GiB to 19GiB.

🔴REMEMBER: If have attached hard disk of size 20GiB and have created physical volume of size 20GiB and then, attached this PV to some volume group and then, created logical volume of size 10GiB (out of 20GiB PV). After this, you have only 10GiB free space left behind which you can extend and contribute by DataNode to the Hadoop-Cluster, But the☑️ POINT TO BE NOTED is that you never extend LV size fully by 10GiB , and if you try , it will give the ‼️ERROR shown in the above picture i.e. Insufficient free space‼️, so you can increase the size by <10GiB. Here, In my case, I have increased the size of LV by 9GiB✴️.

⚡But if you check the size of volume which is mounted to the /dn directory , it still shows 10GiB and not showing the incremented size of 9GiB i.e. total = 19GiB.

Check by using command:-

👉 # df -h

only 10GiB is shown here

>>It is still 10GiB because we have not format the extended volume yet. In Other words, We have to update the inode table of the partition(LV)⚡.

12.)Format the extended Logical Volume

⚡Here, We don’t have a need to unmount the partition(LV) , just have to format the extended logical volume size.

To Format the extended logical volume size, use command:-

👉 # resize2fs /dev/vgdn/lvdn

Again, Check the logical volume size increased or not by using command:-

👉 # df -h

Total LV size = 19GiB

⚡Now , the total size is 19GiB i.e. initially the size of logical volume is 10GiB but after formatting the extended partition or resizing the Logical Volume size , it gets increased by 9GiB more.

13.) Check the Storage contributed by DataNode to the Hadoop-Cluster

DataNode

⚡Now, We can say that Logical Volume size is increased and thus, the DataNode is contributing the 19GiB storage to the Hadoop-Cluster.

🌟Finally, DataNode Storage has been exceeded elastically through LVM concept.

🎗️ TASK COMPLETED SUCCESSFULLY !!🎗️

For any queries, give your response📝 below👇 and if you really like my article and want more efficient tech-content ,then , Don’t forget to clap👏👏 below👇.

THANKYOU FOR READING(*-*)🌻

--

--

Lalita Sharma

Aeromodeller|Passionate|Technoholic|Learner|Technical writer