I have a 3 node HA cluster in a CentOS 8 VM. I am using ZK 3.7.0 and Hadoop 3.3.1.
In my cluster I have 2 namenodes, node1 is the active namenode and node2 is the standby namenode in case that node1 falls. The other node is the datanode
I just start all with the command
start-dfs.sh
In node1 I had the following processes running: NameNode, Jps, QuorumPeerMain and JournalNode
In node2 I had the following processes running: NameNode, Jps, QuorumPeerMain, JournalNode and DataNode.
My hdfs-site.xml configuration is the following:
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/datos/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/datos/datanode</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>ha-cluster</value>
</property>
<property>
<name>dfs.ha.namenodes.ha-cluster</name>
<value>nodo1,nodo2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ha-cluster.nodo1</name>
<value>nodo1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ha-cluster.nodo2</name>
<value>nodo2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ha-cluster.nodo1</name>
<value>nodo1:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.ha-cluster.nodo2</name>
<value>nodo2:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://nodo3:8485;nodo2:8485;nodo1:8485/ha-cluster</value>
</property>
The problem is that since the node2 is the standby namenode I didn't want it to have the DataNode process running, so I killed it. I used the command kill -9 (I know it's not the best way, I should have used hdfs --daemon stop datanode).
Then I entered the hadoop website to check how many datanodes I had. In the node1 (the active namenode) Hadoop website, in the datanode part I only had 1 datanode, node3.
The problem is that in the Hadoop website of the node2 (the standby namenode) was like this:
In case u can't see the image:
default-rack/nodo2:9866 (192.168.0.102:9866) http://nodo2:9864 558s
/default-rack/nodo3:9866 (192.168.0.103:9866) http://nodo3:9864 1s
The node2 datanode hasn't been alive for 558s and it doesn't take the node as dead.
Does anybody know why does this happen??
Copyright Notice:Content Author:「Pablo Ochoa」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/70419926/why-does-a-datanode-doesn%c2%b4t-disappear-in-the-hadoop-web-site-when-the-datanode-j