We have an Alluxio instance which has two different HDFS mounted into it. A specific directory of the local HDFS is mounted at the root i.e. Is the primary under filesystem. A remote HDFS instance is mounted under a specific subdirectory, in this case /aristotle
What we can read data from the remote HDFS instance fine since moving to 1.4.0 we are unable to write data and encounter a client hang when attempting to do so. Checking the worker logs on the worker that holds the blocks for the files to be persisted we see the following error in worker.out:
This was working fine with 1.3.0, this looks to be a HDFS issue but I don't really understand why this is only started happening with 1.4.0. In both installations we use the Hadoop 2.7 build of Alluxio.
Eventually the client will spit out the following error message but it takes a long time for to happen:
Please note that the line where the error occurs is not protected by a try-catch-finally Block so it is entirely possible that this error is also killing the persistence worker thread.
Linux nid00009 3.10.0-327.36.3.el7_3.1-cray_ari_athena_s_cos #1 SMP Mon Jan 9 19:14:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux (CentOS 7 w/ Cray kernel mods)
Subversion email@example.com:hortonworks/hadoop.git -r 26104d8ac833884c8776473823007f176854f2eb
Compiled by jenkins on 2016-02-10T06:18Z
Compiled with protoc 2.5.0
From source with checksum cf48a4c63aaec76a714c1897e2ba8be6