Master cannot be switched in the second time



I am using Alluxio with HDFS under HA mode. All of them are installed in a docker.

I have three dockers: docker1, docker2, docker3. Both docker1 and docker2 have NameNode and AlluxioMaster. All of the three dockers have DataNode and AlluxioWorker. And I have zookeeper running on the three dockers as well.


  1. The namenode on docker1 is active and the master of Alluxio is also on docker1.

  2. I stoped docker1. The namenode on docker2 is switched to active successfully and the leader master of Alluxio is also switched to docker2. No issue here

  3. I started docker1, and all the workers reconnected successfully. I run "jps" on docker1 and AlluxioMaster is up. I also checked the log of AlluxioMaster on docker1, no exception is thrown.

  4. Now the active name node is on docker2 and the master of Alluxio is also on docker2.
    Important I stopped docker2. The namenode on docker1 became active but alluxio on docker1 is not started.
    I checked the log of AlluxioMaster on docker1. I found that the Alluxio Master on docker1 is already switched to leader but it is still connecting the name node on docker2, as if alluxio doesn't know hdfs has already switched to docker1.

Further more: At the beginning, there is something wrong in the alluxio configuration of HDFS HA mode. The switch will fail during the first time with the exactly the same Exception.
After a corrected the configuration of HDFS HA mode. The switch is fine during the first time.
--> So, I guess the reason of the problem is active name node of HDFS switched to another node while Alluxio doesn't know that during the second time


2017-10-23 01:42:35,952 INFO RetryInvocationHandler - Exception while invoking ClientNamenodeProtocolTranslatorPB.getListing over docker2/ after 6 failover attempts. Trying to failover after sleeping for 16646ms. No Route to Host from docker1/ to docker2:9000 failed on socket timeout exception: No route to host; For more details see:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
at java.lang.reflect.Constructor.newInstance(
at org.apache.hadoop.ipc.Client.getRpcResponse(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
at com.sun.proxy.$Proxy11.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at com.sun.proxy.$Proxy12.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(
at org.apache.hadoop.hdfs.DFSClient.listPaths(
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(
at alluxio.underfs.hdfs.HdfsUnderFileSystem.listStatusInternal(
at alluxio.underfs.hdfs.HdfsUnderFileSystem.listStatus(
at alluxio.underfs.UnderFileSystemWithLogging$
at alluxio.underfs.UnderFileSystemWithLogging$
at alluxio.underfs.UnderFileSystemWithLogging.listStatus(
at alluxio.master.journal.ufs.UfsJournalSnapshot.getSnapshot(
at alluxio.master.journal.ufs.UfsJournalReader.updateInputStream(
at alluxio.master.journal.ufs.UfsJournalReader.readInternal(
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(
Caused by: No route to host
at Method)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(
at org.apache.hadoop.ipc.Client$Connection.access$3500(
at org.apache.hadoop.ipc.Client.getConnection(
... 34 more




Andrew Audibert
October 26, 2017, 7:14 PM

It looks like the HDFS client tries to fail over but gives up after 6 attempts. It might be the case that the client gives up before the HDFS failover is complete. Could you try increasing dfs.client.failover.max.attempts in the hdfs configuration?

November 3, 2017, 6:40 AM

Thank you, but I don't understand. HDFS successfully switched to another node. The HDFS failover did succeed.
The leader of Alluxio is also switched successfully, but the new leader is still communicating with the last HDFS master, which is docker2/

Now the active master of HDFS is docker1/

Andrew Audibert
November 3, 2017, 5:57 PM

When Alluxio communicates with HDFS, it uses an HDFS client that is configured by HDFS configuration. The top part of the stack trace is in the HDFS client code, so it looks like the issue is HDFS client configuration.

The HDFS client gives the error message "Exception while invoking ClientNamenodeProtocolTranslatorPB.getListing over docker2/ after 6 failover attempts. Trying to failover after sleeping for 16646ms.". This indicates that the client tries to connect to HDFS a few times, then gives up. It takes some time for HDFS to fail over from docker2 to docker1. If the client gives up before the failover happens, you would see what you're seeing. To avoid this, the client needs to be configured so that its retry period is longer than the maximum failover time for the namenode. That's why I suggest increasing dfs.client.failover.max.attempts.

If you compare the timings of the log messages between Alluxio and the HDFS namenode, you could confirm whether the error you see happens before or after the namenode failover.







Affects versions