FileSystemContext closed in HA mode

Description

When we create an Alluxio HDFS FileSystem client for a URI, we check whether the host and port in the URI match the host and port of the master address in the default FileSystemContext. If the host or port doesn't match, we close and re-create the FileSystemContext. If anything was using that FileSystemContext when this happens, it will will see exceptions like

1 2 java.lang.NullPointerException at alluxio.client.file.FileSystemContext.acquireMasterClient(FileSystemContext.java)

and

1 2 alluxio.exception.status.FailedPreconditionException: Client is closed at alluxio.AbstractClient.retryRPC(AbstractClient.java)

In non-HA mode this is usually fine since the master host/port shouldn't be changing. However, this can cause problems in HA mode since the master address known by the default FileSystemContext changes over time, and is independent from the hostort passed in the URI. This can lead to every FileSystem creation invalidating previous FileSystems.

To address this in the short term, we can compare by connect information instead of simple hostort comparison. For HA mode, this would mean comparing zk address, leader path, and election path. As long as those are the same between the previous FileSystemContext and the new FileSystem, we don't need to reset the FileSystemContext.

Environment

None

Status

Assignee

Andrew Audibert

Reporter

Andrew Audibert

Labels

None

Components

Fix versions

Affects versions

1.7.1

Priority

Major
Configure