Better attempt to return block locations for blocks not in Alluxio workers

Description

Applications like Impala requires block locations to be non-empty for data load to work. Currently Impala could not read from Alluxio Hadoop client when data is not preloaded:

For files in HDFS under file system, Alluxio returns HDFS block location hostnames for BlockLocation::getNames(), but Impala requires two part nameort format which is indicated in javadoc of the method.

For files in non-HDFS or remote HDFS under file system, Alluxio returns no BlockLocation, this is making Impala skip reading of the file all together.

This change updated the client to return full two-part locations for UFS locations if there is an Alluxio worker co-located. If no worker is colocated with the UFS block location, it will fallback to return all the worker hosts so that applications can just pick from one location to read from.

Environment

None

Status

Assignee

Bin Feng

Reporter

Bin Feng

Labels

None

Components

Affects versions

1.6.1

Priority

Major
Configure