Master cannot be switched in the second time

Description

Background

I am using Alluxio with HDFS under HA mode. All of them are installed in a docker.

I have three dockers: docker1, docker2, docker3. Both docker1 and docker2 have NameNode and AlluxioMaster. All of the three dockers have DataNode and AlluxioWorker. And I have zookeeper running on the three dockers as well.

Steps

  1. The namenode on docker1 is active and the master of Alluxio is also on docker1.

  2. I stoped docker1. The namenode on docker2 is switched to active successfully and the leader master of Alluxio is also switched to docker2. No issue here

  3. I started docker1, and all the workers reconnected successfully. I run "jps" on docker1 and AlluxioMaster is up. I also checked the log of AlluxioMaster on docker1, no exception is thrown.

  4. Now the active name node is on docker2 and the master of Alluxio is also on docker2.
    Important I stopped docker2. The namenode on docker1 became active but alluxio on docker1 is not started.
    I checked the log of AlluxioMaster on docker1. I found that the Alluxio Master on docker1 is already switched to leader but it is still connecting the name node on docker2, as if alluxio doesn't know hdfs has already switched to docker1.

Further more: At the beginning, there is something wrong in the alluxio configuration of HDFS HA mode. The switch will fail during the first time with the exactly the same Exception.
After a corrected the configuration of HDFS HA mode. The switch is fine during the first time.
--> So, I guess the reason of the problem is active name node of HDFS switched to another node while Alluxio doesn't know that during the second time

Log

2017-10-23 01:42:35,952 INFO RetryInvocationHandler - Exception while invoking ClientNamenodeProtocolTranslatorPB.getListing over docker2/10.240.1.102:9000 after 6 failover attempts. Trying to failover after sleeping for 16646ms.
java.net.NoRouteToHostException: No Route to Host from docker1/10.240.1.101 to docker2:9000 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1485)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:588)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy12.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1681)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1665)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:896)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:111)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:960)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:957)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:957)
at alluxio.underfs.hdfs.HdfsUnderFileSystem.listStatusInternal(HdfsUnderFileSystem.java:502)
at alluxio.underfs.hdfs.HdfsUnderFileSystem.listStatus(HdfsUnderFileSystem.java:293)
at alluxio.underfs.UnderFileSystemWithLogging$18.call(UnderFileSystemWithLogging.java:327)
at alluxio.underfs.UnderFileSystemWithLogging$18.call(UnderFileSystemWithLogging.java:324)
at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:520)
at alluxio.underfs.UnderFileSystemWithLogging.listStatus(UnderFileSystemWithLogging.java:324)
at alluxio.master.journal.ufs.UfsJournalSnapshot.getSnapshot(UfsJournalSnapshot.java:88)
at alluxio.master.journal.ufs.UfsJournalReader.updateInputStream(UfsJournalReader.java:204)
at alluxio.master.journal.ufs.UfsJournalReader.readInternal(UfsJournalReader.java:163)
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:132)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542)
at org.apache.hadoop.ipc.Client.call(Client.java:1373)
... 34 more

Environment

None

Activity

Show:
Andrew Audibert
October 26, 2017, 7:14 PM

It looks like the HDFS client tries to fail over but gives up after 6 attempts. It might be the case that the client gives up before the HDFS failover is complete. Could you try increasing dfs.client.failover.max.attempts in the hdfs configuration?

HalfLegend
November 3, 2017, 6:40 AM

Thank you, but I don't understand. HDFS successfully switched to another node. The HDFS failover did succeed.
The leader of Alluxio is also switched successfully, but the new leader is still communicating with the last HDFS master, which is docker2/10.240.1.102:9000.

Now the active master of HDFS is docker1/10.240.1.101

Andrew Audibert
November 3, 2017, 5:57 PM

When Alluxio communicates with HDFS, it uses an HDFS client that is configured by HDFS configuration. The top part of the stack trace is in the HDFS client code, so it looks like the issue is HDFS client configuration.

The HDFS client gives the error message "Exception while invoking ClientNamenodeProtocolTranslatorPB.getListing over docker2/10.240.1.102:9000 after 6 failover attempts. Trying to failover after sleeping for 16646ms.". This indicates that the client tries to connect to HDFS a few times, then gives up. It takes some time for HDFS to fail over from docker2 to docker1. If the client gives up before the failover happens, you would see what you're seeing. To avoid this, the client needs to be configured so that its retry period is longer than the maximum failover time for the namenode. That's why I suggest increasing dfs.client.failover.max.attempts.

If you compare the timings of the log messages between Alluxio and the HDFS namenode, you could confirm whether the error you see happens before or after the namenode failover.

Assignee

Unassigned

Reporter

HalfLegend

Labels

Components

Affects versions

Priority

Critical
Configure