Does CLIENT Go To MASTER Or To SLAVE Directly And Read The Data ?

4 min readOct 19, 2020

→Done Hadoop Cluster Configuration by using AWS Cloud ←

As We Know that Master provides the IP Addresses of Datanodes so that Client can Upload the file to the Datanodes. Now, What happen if one Client request to master to read the data of the file…..so

Let’s discuss about the Problem Statement:-

✴️Question: Does client go to master and then read the file on slave via Master or Does Client go to slave directly and read the data?

✴️ Answer: Client goes to slave directly and reads the data stored on slave.

✴️ Prove this.

→Pre-Requisite:- Create an AWS Account and launch the EC2 instances and install jdk and hadoop in all the instances created . Also, Configure all the instances or nodes(i.e. NameNode, DataNodes, Hadoop-Client) which you have created in the Hadoop-Cluster in AWS Cloud.

After Configuring , Start the Hadoop Daemon in all nodes except Client and check they are started or not by using the command given as:-

# hadoop-daemon.sh start namenode/datanode

#jps

Then, run ONE command to check the available DataNodes:

#hadoop dfsadmin -report

After this, Hadoop-Client uploads a file named as “we2.txt” in the HDFS Cluster by giving Replication Factor as 3 and size=32MB. You can check it on WEBUI too.

Command to upload a file:- #hadoop fs -put we2.txt /

#hadoop fs -ls /we2.txt

>>Now, CLIENT Requests MASTER to read the data of the file “we2.txt”, So, run command in Hadoop -Client for reading the file:-

# hadoop fs -cat /we2.txt — — — — (i)

>>Before running the above command, To Check that How CLIENT reads the the data ? From Where it reads the data? Master or Datanodes? Run one command in all DataNodes and in Master i.e.

#tcpdump -i eth0 tcp port 50010 -n -x — — — — —(ii)

So that when CLIENT run the command for reading the file , some data packets are recieved by the Datanodes but not a single packet is recieved by Master(or NameNode).

>>>As CLIENT runs the command of equation(i), shows the outout or screen like this:-

>>>Now, The Data Packets are firstly recieved by DATANODE-1 , that means the file which client want to read ,its some part is present in this datanode1 , that’s why , it has captured some packets , shown as:-

>>>BUT If Due to some reasons, when datanode-1 gets power-off / shutdown or it might possible that the file corrupt or system crash then , now, how client reads the data??

THINK…

THINK______

THINK!!!!!!!!!!!!!!!

ANSWER:- Whenever Client uploads some file or data , it has created some replica or copy of one file into many datanodes which are available in the hadoop-cluster( it occurs by-default also but if client want to change it their own then, they can write some code and change it by go through the hdfs-site.xml file in Hadoop-Client instance.) and the very interesting news is that Client will never know , from which location or Datanode , He/She reads the data even after any instance shutdown , it continue to read data automatically from the other replica of the same file from the another DataNode.

So, When DataNode-1 ShutDown , Client doesn’t affected or even don’t get any information regarding this and continuously reading data from the other replica of the same file “we2.txt” from the DataNode-2, which is shown below:-

In the above picture, DataNode-2 has recieving some data packets which indicates that Hadoop-Client is reading file from the DataNode-2 now.

CONCLUSION:- Client goes to slave directly and reads the data stored on slave and When one DataNode goes OFF due to any reason , it doesn’t affected and keep continue reading from another Replica (or Copy) of the same file from the another DataNode.

HENCE, Proved.

Hope you like my article! I really appreciate your reading .Any Suggestions and FeedBack is highly appreciable and Don’t Forget to 👏🏻 below👇🏻.

THANKYOU(*~*)

Does CLIENT Go To MASTER Or To SLAVE Directly And Read The Data ?

Written by Lalita Sharma

No responses yet